Validating genomic offset predictions with mortality rates in National Forest Inventory (NFI) plots

Published

April 4, 2025

Introduction

Goal: Estimating the association between genomic offset (GO) predictions and mortality rates in natural populations of maritime pine (i.e., National Forest Inventory plots).

Mortality rates in natural populations are considered as a proxy of the absolute fitness of the populations. A positive association between GO predictions and the mortality rates would suggest that GO captures maladaptation in the face of demographic complexity (i.e., the effects of other processes on spatial variance in allele frequencies, such as expansion history and gene flow) as well as the genetic architecture of climate adaptation (polygenic trait architectures, G×E interactions, non-additive genetic variance).

We use mortality data from the French and Spanish National Forest Inventories (NFI) harmonized in Changenet et al. (2021). The French data relies on temporary plots sampled between 2005 and 2014 while the Spanish data relies on permanent plots sampled during the second (from 1986 to 1996) and third NFIs (from 1997 to 2008). A tree was recorded as dead if its death was dated at less than 5 years ago in the French NFI or if it was alive in the second inventory but dead in the third one in the Spanish NFI.

The different census intervals among the plots are accounted for by modelling the proportion \(p_{i}\) of maritime pines that died in the plot \(i\) during the census interval \(\Delta_i\) with the complementary log-log link and an offset on the logarithm of the census interval \(\Delta_{i}\) for the plot \(i\). The modeling approach was evaluated with simulations in the report 14_ValidationNFI_ModelComparison.qmd, in which the estimation accuracy of different models was compared.

In the present report, we evaluate the relationship between mortality rates and G0 predictions with the two following models.

Model with competition as covariate

This model corresponds to the model M3 in the report 14_ValidationNFI_ModelComparison.html.

\[\begin{align*} m_i &\sim \text{Binomial}(N_i,p_i)\\ \text{log}(-\text{log}(1-p_i)) &= \beta_{0,c} + \beta_{C} C_i + \beta_{GO} GO_i + \text{log}(\Delta_i) \\ \end{align*}\]

  • \(N_{i}\) is the total number of maritime pines in the plot \(i\),

  • \(m_{i}\) is the number of maritime pines that died during the census interval \(\Delta_{i}\) in the plot \(i\),

  • \(\beta_{0,c}\) are country-specific intercepts that account for the methodological differences between the French and Spanish inventories that may bias the estimations,

  • \(C_{i}\) is the basal area of all tree species confounded in the plot \(i\) (to account for the competition among trees), with its associated coefficient \(\beta_{C}\),

  • \(X_{i}\) is the climatic distance (between the reference climate and the climate of the inventory period) or genomic offset predicted in the plot \(i\), with its associated coefficient \(\beta_{X}\). Note that in the scripts this variable is noted as \(GO_i\), and its regression coefficient \(\beta_{GO}\), because we initially did not include the climatic distances as predictors.

Model with competition and tree age as covariates

This model corresponds to the model M6_2 in the report 14_ValidationNFI_ModelComparison.html.

\[\begin{align*} m_i &\sim \text{Binomial}(N_i,p_i)\\ \text{log}(-\text{log}(1-p_i)) &= \beta_{0,c} + \beta_{C} C_i + \beta_{DBH} DBH_i + \beta_{C*DBH} C_i DBH_i + \beta_{X} X_i + \text{log}(\Delta_i) \\ \end{align*}\]

  • \(DBH_{i}\) the mean diameter at breast height (DBH) of the maritime pines in the plot \(i\) (including adults, dead trees and recruitment trees) and its associated coefficient \(\beta_{DBH}\). This variable is expected to account for the different mean tree ages across plots that may impact mortality rates.

We present the results from this model in the main manuscript.

Priors

We used the following weakly informative priors:

\[\begin{bmatrix} \beta_{0,c} \\ \beta_C \\ \beta_{X} \\ \beta_{DBH} \\ \beta_{C*DBH} \end{bmatrix} \sim \text{Normal}(0,1)\]

NFI data

NFI data was harmonized in Changenet et al. (2021).

Code
nfi_data <- readRDS(here::here("data/NFIdata/NFIrawdata_Changenet2021.rds")) %>% 
  droplevels() %>% 
  as_tibble() %>% 
  dplyr::select(plotcode,
                longitude,
                latitude,
                country,
                yearsbetweensurveys, # number of years between surveys
                surveydate1, # date first survey Spanish NFI
                surveydate, # year of the second survey in the Spanish NFI and the unique survey in the French NFI
                treeNbrJ.M, # number of dead trees in the plot
                treeNbrJ.IMall, # total number of trees in the plot
                BA.ha.plot.1, # basal area of all tree species in the plot
                dbhJ.IMall.plot.mean # mean DBH of the maritime pines in the plot including dead trees but also saplings (recruitment) 
                ) %>% 
  dplyr::rename(nb_years = yearsbetweensurveys,
                nb_dead = treeNbrJ.M,
                nb_tot = treeNbrJ.IMall,
                basal_area = BA.ha.plot.1,
                mean_DBH = dbhJ.IMall.plot.mean,
                first_survey = surveydate1,
                second_survey = surveydate) %>% 
  dplyr::mutate(first_survey = case_when(!is.na(first_survey) ~ str_sub(first_survey,1,4) %>% as.numeric()), # keep only the year of the 1st survey
                prop_dead = nb_dead/nb_tot,  # proportion of dead trees
                annual_prop_dead = (nb_dead/nb_tot)/nb_years) # annual proportion of dead trees

Variables in the dataset

  • Plot code: plotcode
  • Longitude and latitude of the plot: longitude and latitude.
  • Country in which the plot is: country (ES = Spain, FR = France).
  • The number of years between surveys in the Spanish inventory (which is equal to 5 in the French inventory as mortality is estimated in the five years before the survey date): nb_years.
  • The dates of the first and second survey in the Spanish inventory: surveydate1 and surveydate2.
  • The year of the second survey in the Spanish inventory and of the unique survey in the French inventory: surveydate.
  • The number of dead trees in the plot: nb_dead.
  • The total number of trees in the plot: nb_tot.
  • The basal area of all tree species in the plot (proxy of the competition among trees): basal_area.
  • The mean DBH of the maritime pines in the plot including dead trees but also saplings (recruitment): mean_DBH.

12610 plots in this dataset: 4097 from the French NFI and 8513 from the Spanish inventory.

Filtering

How many NAs in each column?

Code
nfi_data %>% 
  summarise(across(everything(), ~ sum(is.na(.)))) %>% 
  pivot_longer(everything(), names_to = "Variable", values_to = "Number of NAs") %>% 
  kable_mydf(boldfirstcolumn = T)
Variable Number of NAs
plotcode 0
longitude 0
latitude 0
country 0
nb_years 0
first_survey 4097
second_survey 0
nb_dead 0
nb_tot 0
basal_area 277
mean_DBH 664
prop_dead 664
annual_prop_dead 664

We extract the plots with missing data for the proportion of dead trees and we look at the first eight plots.

Code
df_na_prop_dead <- nfi_data %>% 
  dplyr::filter(is.na(prop_dead))
df_na_prop_dead[1:8,]  %>% kable_mydf(boldfirstcolumn = F)
plotcode longitude latitude country nb_years first_survey second_survey nb_dead nb_tot basal_area mean_DBH prop_dead annual_prop_dead
ES100047A1 -6.34 40.42 ES 11 1990 2001 0 0 2.40 NA NaN NaN
ES100094A1 -6.22 40.39 ES 11 1990 2001 0 0 3.52 NA NaN NaN
ES100129A1 -6.28 40.37 ES 11 1990 2001 0 0 NA NA NaN NaN
ES100151A1 -6.30 40.37 ES 11 1990 2001 0 0 NA NA NaN NaN
ES100203A1 -6.27 40.35 ES 11 1990 2001 0 0 6.11 NA NaN NaN
ES100206A1 -6.52 40.34 ES 11 1990 2001 0 0 NA NA NaN NaN
ES100232A1 -6.17 40.44 ES 11 1990 2001 0 0 NA NA NaN NaN
ES100307A1 -6.93 40.25 ES 11 1990 2001 0 0 NA NA NaN NaN

No maritime pine was recorded in these eight plots (nb_dead = 0 and nb_tot = 0). We check that this is the case for the 664 plots by looking at the unique values in the columns nb_dead and nb_tot in the subset of plots with missing data for the proportion of dead trees (the df_na_prop_dead dataset).

Code
df_na_prop_dead %>% 
  summarise(across(nb_dead:nb_tot, ~ unique(.))) %>% 
  pivot_longer(everything(), names_to = "Variable", values_to = "Unique values") %>% 
  kable_mydf(boldfirstcolumn = T)
Variable Unique values
nb_dead 0
nb_tot 0

We remove these 664 plots.

Code
nfi_data <- nfi_data %>% 
  dplyr::filter(!is.na(prop_dead))

We check that there are no more plots without maritime pine.

Code
nfi_data %>% dplyr::filter(nb_tot==0) %>% nrow()
[1] 0

Missing data for basal area:

Code
nfi_data %>% 
  summarise('Number of NAs for basal area' = sum(is.na(basal_area))) %>% 
  kable_mydf(boldfirstcolumn = F)
Number of NAs for basal area
29

Since there are not many plots with missing basal area data, we delete them.

Code
nfi_data <- nfi_data %>% 
  dplyr::filter(!is.na(basal_area))

After these filtering steps, there are 11917 plots left.

DRYAD REPOSITORY: The dataset is saved in the DRYAD repository: NFIdata_cleaned.csv dataset.

Code
nfi_data %>% write_csv(here("data/DryadRepo/NFIdata_cleaned.csv"))

Inventory dates

Code
country_names <- c(`ES` = "SPAIN",`FR` = "FRANCE")

ggplot(nfi_data, aes(x=second_survey, fill=as.factor(country))) + 
  geom_histogram(alpha=0.5, position="identity") + 
  theme_bw()  +
  facet_wrap(~as.factor(country), labeller = as_labeller(country_names)) + 
  labs(y="Count of plots",x="Inventory date") +
  theme(legend.position = "none")

Code
ggplot(nfi_data, aes(x=first_survey)) + 
  geom_histogram(alpha=0.5, position="identity") + 
  theme_bw()

Code
nfi_data %>% 
  ggplot(aes(x=nb_years, fill=country)) + 
  geom_histogram(alpha=0.5, position="identity") + 
  labs(y="Count of plots",x="Number of years between surveys") +
  theme_bw()

viz the annual prob of mortality

Code
p <- ggplot() + 
    geom_sf(data = world, fill="gray98") + 
    theme_bw() +
    scale_x_continuous(limits = c(-10, 11)) +
    scale_y_continuous(limits = c(35, 51)) + 
    geom_point(data=nfi_data, aes(x=longitude,y=latitude,color=annual_prop_dead), size=1, alpha = 0.8) + 
  xlab("") + ylab("") +
    ggtitle(NULL) +
    theme(legend.position = c(0.8,0.2),
          axis.text = element_text(size=14),
          legend.box.background = element_rect(fill = "white", linewidth=0.6, colour="gray")) +
  geom_point(data=nfi_data %>% filter(annual_prop_dead>0.07),
             aes(x=longitude,y=latitude,color=annual_prop_dead), size=1.4, alpha = 0.3) +
  geom_point(data=nfi_data %>% filter(annual_prop_dead>0.13),
             aes(x=longitude,y=latitude,color=annual_prop_dead), size=1.4, alpha = 0.3) +
  scale_color_gradientn(name = "Observed annual\nproportion of mortality", 
                        colours = c("#00FF66", "#0066FF",rep("#CCFF00",2),rep("orange",3),rep("#FF0000",5)), limits=NULL)
Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2 3.5.0.
ℹ Please use the `legend.position.inside` argument of `theme()` instead.
Code
ggsave(plot = p, filename = here("figs/ValidationNFI/ObservedAnnualMortalityProportionMap.pdf"), height = 6, width = 6)

p

Climatic data

Generating file for ClimateDT

Code
# NFI plots coordinates
# ---------------------
nfi_climateDT <- nfi_data %>% 
  dplyr::select(plotcode, longitude, latitude) %>% 
  write_csv(here::here("data/nfi_coordinates.csv"))

We are going to extract the climatic data between 10 years before the first survey until the date of the second survey. For the French NFI, as there is only one survey, we will extract climatic data for the 15 years before the survey.

Code
nfi_data %>%
  group_by(country) %>%
  mutate(min_year = case_when(country=="FR" ~ second_survey-15,
                              country=="ES" ~ first_survey-10)) %>% 
  summarise(min_year=min(min_year),
            max_year=max(second_survey)) %>% 
  kable_mydf()
country min_year max_year
ES 1972 2008
FR 1990 2014

So, we need annual climatic data at the location of the NFI plots for the period 1972-2014.

Calculate climate for each plot

Once we have the annual climatic data at the location of the NFI plots, we generate two datasets:

  • one with the past climate data of the NFI plots, across the period 1901-1950.

  • one with the climatic data across a period specific to each plot, corresponding to:

    • The period from 5 years before the first inventory date to the second inventory date for the Spanish NFI.

    • The period from 10 years before the inventory date and the inventory date for the French NFI. A tree is considered dead in the French NFI if its death is estimated as less than 5 years before the inventory date.

We included the 5 years preceding the estimated date of tree death, as the climate of the previous years may have lag effects on tree death.

We load the annual climatic data, which corresponds to point estimate at the location of the NFI plots:

Code
clim <- read_csv(here("data/ClimaticData/NFIplots/ClimateDT_cmip6_GFDL-ESM4_NFIcoordinates_ADJ.csv"), show_col_types = FALSE) %>% 
  dplyr::rename(plotcode=ID,
                longitude=Longitude,
                latitude=Latitude,
                elevation=Elevation,
                year=Year)

We calculate the average climatic conditions across the period 1901-1950:

Code
# Function to calculate the average of the climatic variables at the location of the NFI plots
source(here("scripts/functions/calc_avg_clim_var.R"))

clim_ref <- clim %>% calc_avg_clim_var(ref_period = c(1901,1950), id_spatial_points = "plotcode")
Code
# !!Coding warning!! 
######################

# The coordinates from ClimateDT are not exactly the same as the original ones. 
# Therefore, merging the climatic data with the original data has to be done with the `plotcode` column 
# and not the latitude and longitude of the populations.

# Comparing original coordinates with the ones from ClimateDT
comp_coord <- clim_ref %>% 
  dplyr::select(plotcode,latitude,longitude) %>% 
  dplyr::rename(latitude_climDT=latitude, longitude_climDT=longitude)
comp_coord <- nfi_data %>% 
  dplyr::select(plotcode,latitude,longitude) %>% 
  inner_join(comp_coord, by="plotcode") %>% 
  mutate(diff_latitude=latitude_climDT - latitude,
         diff_longitude=longitude_climDT - longitude) %>% 
  dplyr::filter(diff_latitude != 0 | diff_longitude != 0)

# There are very very small differences
range(comp_coord$diff_latitude)
range(comp_coord$diff_longitude)
Code
# checking time periods for Maurizio
# ==================================

time_periods <- clim_survey %>% 
  dplyr::select(min_year_clim,second_survey) %>% 
  group_by(min_year_clim,second_survey) %>% 
  summarise(count=n()) 

time_periods %>% 
  print(n=nrow(.)) %>% 
  kable_mydf()

time_periods %>% filter(count <10)

time_periods %>% 
  write_csv(here("data/ClimaticData/NFIplots/NFIplots_TimePeriods.csv"))

We then calculate the average climatic conditions across the survey periods specific to each NFI plot:

Code
# We calculate the mean climatic values between:
  # - five years before the first inventory date
  # - the second inventory date
clim_survey <- nfi_data %>% 
  mutate(min_year_clim = case_when(country=="FR" ~ second_survey-10,
                                   country=="ES" ~ first_survey-5))

# we calculate the average climatic conditions for each specific period
clim_survey <- clim_survey %>% 
  group_by(min_year_clim,second_survey) %>% 
  group_split() %>% 
  purrr::map(\(x){
  
min_year <- unique(x$min_year_clim)
max_year <- unique(x$second_survey)

clim %>% 
  right_join(x[,"plotcode"],by=c("plotcode")) %>%
  calc_avg_clim_var(ref_period = c(min_year,max_year), id_spatial_points = "plotcode")
  
}) %>% list_rbind() 


clim_ref <- clim_ref  %>% arrange(plotcode)
clim_survey <- clim_survey %>% arrange(plotcode)

# Checking correlations between past and inventory climates
# lapply(colnames(clim_ref)[-1], function(x){
#   tibble(variable_name = x,
#          cor_coeff = cor(clim_ref[[x]],clim_survey[[x]]))
#   
# }) %>% 
#   bind_rows() %>% 
#   print(n=nrow(.))


# we save in a list the average climatic conditions across the two periods (reference period and period of the NFI surveys)
list(clim_ref = clim_ref,
     clim_survey = clim_survey) %>% 
  saveRDS(here("data/ClimaticData/NFIplots/NFIclimate.rds"))

DRYAD REPOSITORY: We save the averaged climatic data at the locations of the NFI plots in the DRYAD repository: ClimateDT_NFIPlots_PastClimates.csv dataset for the averaged climates over the reference period 1901-1950 and ClimateDT_NFIPlots_SurveyClimates.csv" dataset for the averaged climates over the survey periods specific to each plot.

Code
clim_ref %>% write_csv(here("data/DryadRepo/ClimateDT_NFIPlots_PastClimates.csv"))
clim_survey %>% write_csv(here("data/DryadRepo/ClimateDT_NFIPlots_SurveyClimates.csv"))

Climatic space covered

We look at how the 34 sampled populations cover the climatic space of the NFI plots.

Code
# Selected climatic variables
clim_var <- readRDS(here("data/ClimaticData/NamesSelectedVariables.rds"))

# Reference climate of the NFI plots
clim_nfi <- clim_ref %>% 
  dplyr::select(plotcode,contains("ude"),elevation, all_of(clim_var)) %>% 
  dplyr::mutate(subdf="nfi") %>% 
  dplyr::rename(id=plotcode)

# Gene pools of the populations
gps <- readRDS(here("data/GenomicData/MainGenePoolInformation.rds"))[[1]] %>% arrange(pop)

# Reference climate of the populations
clim_pop <- readRDS(here("data/ClimaticData/MaritimePinePops/ClimatePopulationLocationPointEstimates_ReferencePeriods_noADJ.rds"))[[1]]$ref_means %>% 
  dplyr::select(pop,contains("ude"),elevation, all_of(clim_var)) %>% 
  left_join(gps,by="pop") %>% 
  dplyr::rename(id=pop,
                subdf=main_gp_pop) 

# Combining the dataset of the populations and the NFI plots
df<- bind_rows(clim_pop,clim_nfi) %>% 
  mutate(sudf=factor(subdf,levels=c("nfi",unique(clim_pop$subdf))))


# Mean annual temperature vs annual precipitation
# ===============================================
p <- df %>%
  ggplot(aes(x=bio1,y=bio12,label=id)) +
  geom_point(data=df %>% filter(subdf=="nfi"), color="gray85") +
  geom_point(data=df %>% filter(!subdf=="nfi"), aes(color=subdf),size=4) +
  scale_color_manual(values=c("darkorchid3",
                              "gold2",
                              "navyblue",
                              "turquoise2",
                              "orangered3",
                              "green3")) +
  xlab("Mean annual temperature (bio1, °C)") +
  ylab("Annual precipitation (bio12, mm)") +
  geom_text_repel(data=df %>% filter(!subdf=="nfi")) +
  theme_bw() +
  theme(legend.title = element_blank(),
        legend.position = c(0.81,0.87),
        legend.text = element_text(size=16),
        legend.background = element_blank(),
        axis.text = element_text(size=15),
        axis.title = element_text(size=18))

p %>% ggsave(filename = here("figs/ValidationNFI/ClimaticCoverage_Bio1vsBio12.pdf"),
             device="pdf",
             width = 8,
             height=8)

p

Code
# Precipitation seasonality vs Isothermality    
# ==========================================
p <- df %>%
  ggplot(aes(x=bio15,y=bio3,label=id)) +
  geom_point(data=df %>% filter(subdf=="nfi"), color="gray85") +
  geom_point(data=df %>% filter(!subdf=="nfi"), aes(color=subdf),size=4) +
  scale_color_manual(values=c("darkorchid3",
                              "gold2",
                              "navyblue",
                              "turquoise2",
                              "orangered3",
                              "green3")) +
  xlab("Precipitation seasonality (bio15, coefficient of variation)") +
  ylab("Isothermality (bio3, index)") +
  geom_text_repel(data=df %>% filter(!subdf=="nfi")) +
  theme_bw() +
  theme(legend.title = element_blank(),
        legend.position = c(0.18,0.15),
        legend.text = element_text(size=16),
        legend.background = element_blank(),
        axis.text = element_text(size=15),
        axis.title = element_text(size=18))

p %>% ggsave(filename = here("figs/ValidationNFI/ClimaticCoverage_Bio3vsBio15.pdf"),
             device="pdf",
             width = 8,
             height=8)

p

Code
# Temperature seasonality and Summer heat moisture index    
# ======================================================
p <- df %>%
  ggplot(aes(x=bio4,y=SHM,label=id)) +
  geom_point(data=df %>% filter(subdf=="nfi"), color="gray85") +
  geom_point(data=df %>% filter(!subdf=="nfi"), aes(color=subdf),size=4) +
  scale_color_manual(values=c("darkorchid3",
                              "gold2",
                              "navyblue",
                              "turquoise2",
                              "orangered3",
                              "green3")) +
  xlab("Temperature seasonality (bio4, standard deviation ×100)") +
  ylab("Summer heat moisture index (SHM, °C/mm)") +
  geom_text_repel(data=df %>% filter(!subdf=="nfi")) +
  theme_bw() +
  theme(legend.title = element_blank(),
        legend.position = c(0.18,0.89),
        legend.text = element_text(size=16),
        legend.background = element_blank(),
        axis.text = element_text(size=15),
        axis.title = element_text(size=18))

p %>% ggsave(filename = here("figs/ValidationNFI/ClimaticCoverage_Bio4vsSHM.pdf"),
             device="pdf",
             width = 8,
             height=8)

p

Climatic distances

We calculate the climatic distances between the climate during the period 1901-1950 and the climate during the inventory period. The climatic distances were calculated separately for each climatic variable and we also calculated the Euclidean climatic distances based on all selected climatic variables.

Code
# selected climatic variables
clim_var <- readRDS(here("data/ClimaticData/NamesSelectedVariables.rds"))

# Load the climatic data of the NFI plots.
nfi_clim <- readRDS(here("data/ClimaticData/NFIplots/NFIclimate.rds")) 
# nfi_clim[[1]] <- nfi_clim[[1]] %>% arrange(plotcode)
# nfi_clim[[2]] <- nfi_clim[[2]] %>% arrange(plotcode)
  
# Keep only the climatic variables of interest and scale the climatic data
source(here("scripts/functions/generate_scaled_nfi_clim_datasets.R"))
nfi_dfs <- generate_scaled_nfi_clim_datasets(clim_var, clim_ref = nfi_clim$clim_ref, clim_pred = nfi_clim$clim_survey)

clim_dist <- lapply(clim_var, function(var){(nfi_clim[[2]][[var]] - nfi_clim[[1]][[var]]) %>% abs()}) %>% 
  setNames(clim_var) %>% 
  bind_rows() %>% 
  bind_cols(nfi_clim[[1]][,1],.)

# Calculating the Euclidean distances for all climatic variables
clim_dist$EucliDist <- map_dbl(1:nrow(nfi_dfs[[1]]),
                               ~ sqrt(sum((nfi_dfs[[1]][.x,clim_var] - nfi_dfs[[2]][.x,clim_var])^2)))

clim_dist <- clim_dist %>% 
  pivot_longer(cols=c(any_of(clim_var),"EucliDist"),names_to="snp_set_code",values_to="GO") %>% 
  mutate(method = "CD",
         method_snp_set_code = paste0("CD_",snp_set_code),
         snp_set_name = case_when(snp_set_code == "bio1" ~ "Mean annual temperature (bio1, °C)",
                                  snp_set_code == "bio12" ~ "Annual precipitation (bio12, mm)",
                                  snp_set_code == "bio15" ~ "Precipitation seasonality (bio15, index)",
                                  snp_set_code == "bio3" ~ "Isothermality (bio3, index)",
                                  snp_set_code == "bio4" ~ "Temperature seasonality (bio4, °C)",
                                  snp_set_code == "SHM" ~ "Summer heat moisture index (SHM, °C/mm)",
                                  snp_set_code == "EucliDist" ~ "All climatic variables (Euclidean distance)"),
         method_snp_set_name = paste0("CD - ", snp_set_name)) 

We show below the maps with the difference between the climate during the inventory period - climate during the reference period, for each climatic variable.

Code
lapply(clim_var, function(i){
  
df_diffclim <-  nfi_data %>% mutate(diff_clim_var = clim_survey[[i]]-clim_ref[[i]]) 

    ggplot() + 
    geom_sf(data = world, fill="gray98") + 
    theme_bw() +
    scale_x_continuous(limits = c(-10, 11)) +
    scale_y_continuous(limits = c(35, 51)) + 
    geom_point(data=df_diffclim, aes(x=longitude,y=latitude,color=diff_clim_var), size=1, alpha = 0.8) + 
  xlab("") + ylab("") +
    ggtitle(NULL) +
    theme(legend.position = c(0.8,0.2),
          axis.text = element_text(size=14),
          legend.box.background = element_rect(fill = "white", linewidth=0.6, colour="gray")) +
  scale_color_gradientn(name = i, 
                        colours = c("#00FF66", "#0066FF","#CCFF00","orange","#FF0000"), limits=NULL)

})
[[1]]


[[2]]


[[3]]


[[4]]


[[5]]


[[6]]

Genomic offset predictions

We load the genomic offset predictions estimated from the different GEAs.

Code
# Codes of the SNP sets
snp_sets <- readRDS(here("outputs/list_snp_sets.rds"))

# Which RDA method?
RDA_method <- "predict"

# List of methods
meth_names <- c("GDM", "GF", "LFMM", "pRDA", "RDA")

df <- lapply(snp_sets, function(x){

list_snps <- list()

list_snps$GDM <- readRDS(file=here("outputs/GDM/go_predictions.rds"))[[x$set_code]][["go_nfi"]]
list_snps$GF <- readRDS(file=here("outputs/GF/go_predictions.rds"))[[x$set_code]][["go_nfi"]]
list_snps$LFMM <- readRDS(file=here("outputs/LFMM/go_predictions.rds"))[[x$set_code]][["go_nfi"]]
list_snps$pRDA <- readRDS(file=here(paste0("outputs/RDA/go_predictions_",RDA_method,".rds")))[[x$set_code]][["go_nfi_corrected"]]
list_snps$RDA <- readRDS(file=here(paste0("outputs/RDA/go_predictions_",RDA_method,".rds")))[[x$set_code]][["go_nfi_uncorrected"]]


df <- list_snps %>% 
  bind_rows() %>% 
  mutate(snp_set_code = x$set_code,
         snp_set_name = x$set_name,
         plotcode = unique(nfi_data$plotcode))

return(df)
}) %>% 
  bind_rows() %>% 
  pivot_longer(cols= meth_names, names_to = "method", values_to ="GO") %>%
  mutate(method_snp_set_code = paste0(method, "_", snp_set_code),
         method_snp_set_name = paste0(method, " - ", snp_set_name)) %>% 
  bind_rows(clim_dist)

saveRDS(df, here("outputs/ValidationNFI/go_nfi_predictions.rds"))
Code
df <- readRDS(here("outputs/ValidationNFI/go_nfi_predictions.rds"))

# We do not use the genomic offset predictions based on SNPs without missing data:
df <- df %>% dplyr::filter(!snp_set_code=="control_without_nas")

Regression coefficients

We estimate the regression coefficients between genomic offset predictions and mortality rates using Bayesian hierarchical models.

Code
# Colors for the different sets of GO method (GDM, GF, LFMM and RDA) / SNP set (control and candidate SNPs, and all SNPs for LFMM)
colors_coeff <- c("#004586FF","#FF7F0FFF","#FFD320FF","#579D1CFF","#7E0021FF","#83CAFFFF","aquamarine3","#FEB5A2FF","#F02720FF","darkviolet","deeppink","cyan2","chocolate4")

Model with competition as covariate

Stan code

We did not run this model for the manuscript, so we only include its Stan code below.

Stan code of the model (model M3 in the report 14_ValidationNFI_ModelComparison.html):

Code
stancode = stan_model(here("scripts/StanModels/ValidationNFI_m3.stan"))
print(stancode)
S4 class stanmodel 'anon_model' coded as follows:
data {
  int N;
  vector[N] log_nb_years;                                                           // Offset to account for different census intervals
  vector[N] C;                                                                      // Proxy of the competition among trees (i.e. basal area)
  vector[N] GO;                                                                     // Genomic offset
  int nb_dead[N];                                                                   // Number of dead trees in the plot
  int nb_tot[N];                                                                    // Total number of trees in the plot
  int<lower=0> nb_country;                                                          // Number of countries
  int<lower=0, upper=nb_country> country[N];                                        // Countries
}
parameters {
  vector[nb_country] alpha_country;
  real beta_GO;
  real beta_C;
}
model {
  nb_dead ~ binomial(nb_tot,inv_cloglog(alpha_country[country] + beta_GO * GO + beta_C * C  + log_nb_years));

  alpha_country ~ normal(0, 1);
  beta_GO ~ normal(0, 1);
  beta_C ~ normal(0, 1);
} 

Model with competition and tree age as covariates

Stan code

Stan code of the model (model M6_2 in the report 14_ValidationNFI_ModelComparison.html):

Code
stancode = stan_model(here("scripts/StanModels/ValidationNFI_m6_2.stan"))
print(stancode)
S4 class stanmodel 'anon_model' coded as follows:
data {
  int N;
  vector[N] log_nb_years;                                                           // Offset to account for different census intervals
  vector[N] C;                                                                      // Proxy of the competition among trees (i.e. basal area)
  vector[N] DBH;                                                                    // Proxy of the average tree age (i.e. mean DBH)
  vector[N] GO;                                                                     // Genomic offset
  int nb_dead[N];                                                                   // Number of dead trees in the plot
  int nb_tot[N];                                                                    // Total number of trees in the plot
  int<lower=0> nb_country;                                                          // Number of countries
  int<lower=0, upper=nb_country> country[N];                                        // Countries
}
parameters {
  vector[nb_country] alpha_country;
  real beta_GO;
  real beta_C;
  real beta_DBH;
  real beta_C_DBH;
}
model {
  nb_dead ~ binomial(nb_tot,inv_cloglog(alpha_country[country] + beta_GO * GO + beta_C * C  + beta_DBH * DBH + beta_C_DBH * C .* DBH + log_nb_years));

  alpha_country ~ normal(0, 1);
  beta_GO ~ normal(0, 1);
  beta_C ~ normal(0, 1);
  beta_DBH ~ normal(0, 1);
  beta_C_DBH ~ normal(0, 1);
} 
Code
# Parameters to estimate
params_to_estimate <- c("alpha_country","beta_GO","beta_C", "beta_DBH","beta_C_DBH")

Running the model

Code
# Running one model for each of the nine sets of method / SNP set
#unique(df$method_snp_set_code)[!grepl(c("CD|all_cand|GDM_common_cand"), unique(df$method_snp_set_code))]
#unique(df$method_snp_set_code)[grepl(c("CD"), unique(df$method_snp_set_code))]
lapply(unique(df$method_snp_set_code), function(method_snp_set_i){

  # Subset the data - keeping only one set of method / SNP set
  sub_data <- df %>% 
    filter(method_snp_set_code == method_snp_set_i) %>% 
    inner_join(nfi_data, by="plotcode")
      
  # Data in a list for Stan 
  stanlist <- list(N = nrow(sub_data),
                   nb_dead = sub_data$nb_dead,
                   nb_tot=sub_data$nb_tot,
                   GO = (sub_data$GO - mean(sub_data$GO))/sd(sub_data$GO),
                   C = (sub_data$basal_area - mean(sub_data$basal_area))/sd(sub_data$basal_area),
                   DBH = (sub_data$mean_DBH - mean(sub_data$mean_DBH))/sd(sub_data$mean_DBH),
                   nb_country=length(unique(sub_data$country)),
                   country=as.numeric(as.factor(sub_data$country)),
                   log_nb_years=log(sub_data$nb_years))

       

  # Running the Stan model
  mod <- sampling(stancode, 
                  data = stanlist, 
                  iter = 3000, 
                  chains = 4, 
                  cores = 4,
                  init=0, 
                  save_warmup = FALSE,
                  warmup = 1500, 
                  pars = params_to_estimate,
                  verbose =T)

  # Save the model and the stanlist
  list(mod = mod, stanlist = stanlist) %>% 
    saveRDS(file=here(paste0("outputs/ValidationNFI/StanModels/M62_",method_snp_set_i,".rds")))

  })
Code
coefftab <- lapply(unique(df$method_snp_set_code), function(method_snp_set_i){

  # Subset the data - keeping only one set of method / SNP set
  sub_data <- df %>% 
    filter(method_snp_set_code == method_snp_set_i)
  
 list_mod <- readRDS(file=here(paste0("outputs/ValidationNFI/StanModels/M62_",method_snp_set_i,".rds")))
 
 # Model coefficients
 conf95 <- broom.mixed::tidyMCMC(list_mod$mod,
                                  pars=params_to_estimate,
                                  droppars = NULL, 
                                  robust = FALSE, # give mean and standard deviation
                                  ess = T, # effective sample size estimates
                                  rhat = T, # Rhat estimates
                                  conf.int = T, conf.level = 0.95 # include 95% credible intervals
                                  ) %>% 
    dplyr::rename(conf95_low=conf.low,
                  conf95_high=conf.high,
                  mean=estimate,
                  std_deviation=std.error)
  
  broom.mixed::tidyMCMC(list_mod$mod,
                        pars=params_to_estimate,
                        droppars = NULL, 
                        robust = TRUE, # give median and mean absolute deviation (= avg distance btw each data point and the mean)
                        ess = F, 
                        rhat = F, 
                        conf.int = T, 
                        conf.level = 0.80 # include 80% credible intervals
                        ) %>% 
    dplyr::rename(conf80_low=conf.low,
                  conf80_high=conf.high,
                  median=estimate,
                  mean_absolute_deviation=std.error) %>%
    inner_join(conf95,by=c("term")) %>% 
    mutate(method_snp_set_code = method_snp_set_i,
           method_snp_set_name = unique(sub_data$method_snp_set_name),
           snp_set_name = unique(sub_data$snp_set_name),
           snp_set_code = unique(sub_data$snp_set_code),
           method = unique(sub_data$method),
           .before=1) %>% 
    saveRDS(file=here(paste0("outputs/ValidationNFI/StanModels/ModelM62_",method_snp_set_i,"_coefftab.rds")))
  
})
Code
coefftab <- lapply(unique(df$method_snp_set_code), function(method_snp_set_i){
  readRDS(file=here(paste0("outputs/ValidationNFI/StanModels/ModelM62_",method_snp_set_i,"_coefftab.rds")))
  }) %>% 
  bind_rows() 
  
coefftab %>% mutate(snp_set_name = factor(snp_set_name, levels = c(unique(coefftab$snp_set_name)[[6]],
                                                        unique(coefftab$snp_set_name)[[4]],
                                                        unique(coefftab$snp_set_name)[[5]],
                                                        unique(coefftab$snp_set_name)[[3]],
                                                        unique(coefftab$snp_set_name)[[1]],
                                                        unique(coefftab$snp_set_name)[[2]],
                                                        unique(coefftab$snp_set_name)[[7]],
                                                        unique(coefftab$snp_set_name)[[8]],
                                                        unique(coefftab$snp_set_name)[[9]],
                                                        unique(coefftab$snp_set_name)[[10]],
                                                        unique(coefftab$snp_set_name)[[11]],
                                                        unique(coefftab$snp_set_name)[[12]],
                                                        unique(coefftab$snp_set_name)[[13]])),
          method = factor(method, levels = c(unique(coefftab$method)[[1]],
                                             unique(coefftab$method)[[2]],
                                             unique(coefftab$method)[[3]],
                                             unique(coefftab$method)[[4]],
                                             unique(coefftab$method)[[5]],
                                             unique(coefftab$method)[[6]]))) %>% 
  saveRDS(file=here("outputs/ValidationNFI/ModelM62_coefftab.rds"))

Predictions of tree mortality probability

Plots

Let’s visualize the uncertainty around the estimation of the probability of tree mortality \(p\).

Code
############################################################################################
# Visualizing the relationships between GO predictions and mortality probability predictions
############################################################################################
selected_method_snp_set <- unique(df$method_snp_set_code)[!grepl("CD", unique(df$method_snp_set_code))]

myplots <- lapply(selected_method_snp_set, function(method_snp_set_code_i){

# We load the stanlist
stanlist <- readRDS(file = here(paste0("outputs/ValidationNFI/StanModels/M62_",method_snp_set_code_i,".rds")))$stanlist

# We load the model
mod <- readRDS(file = here(paste0("outputs/ValidationNFI/StanModels/M62_",method_snp_set_code_i,".rds")))$mod

# We extract the posterior distributions of all parameters
post <- as.data.frame(mod)

# Vector of predictor values (based on the min and max of the GO predictions)
x_vec <- seq(min(stanlist$GO),max(stanlist$GO),0.05)

# Function to predict the mortality probability p with the initial tree height fixed to the mean
p_pred <- lapply(c("alpha_country[1]","alpha_country[2]"), function(country_i){
  
  f_p <- function(x) 1 -(exp(-exp(post[,country_i] + 
                                    post$`beta_DBH` * mean(stanlist$DBH) + 
                                    post$`beta_C` * mean(stanlist$C) + 
                                    post$`beta_C_DBH` * mean(stanlist$C) * mean(stanlist$DBH) + 
                                    post$`beta_GO` * x +
                                    log(5))))
  
sapply(x_vec, f_p) %>% 
    tibble::as_tibble() %>%
    rename_all(function(x) x_vec) %>%
    mutate(Iter = row_number()) %>%
    gather(x, p, -Iter) %>%
    group_by(x) %>%
    mutate(hpdi_l = HDInterval::hdi(p, credMass = 0.90)[1],
           hpdi_h = HDInterval::hdi(p, credMass = 0.90)[2],
           p_mean = mean(p)) %>%
    ungroup() %>%
    mutate(x = as.numeric(x),
           Country=ifelse(country_i=="alpha_country[1]","Spain","France"))
  
}) %>% bind_rows()


# Keep mean and 90% HDPI of p (one value for each iteration)
p_mean_df <- p_pred %>%
  dplyr::select(x,hpdi_l,hpdi_h,p_mean,Country) %>% 
  dplyr::distinct()


# Plots
#=======

# Plot options 
y_limits <- c(0,0.2)
x_axis <- "Mean-centered GO predictions"
y_axis <- "Probability of tree mortality"

p <- ggplot() +
  ylim(y_limits) +
  labs(y=y_axis, x=x_axis) +
  theme_bw() +
  theme(legend.position=c(0.2,0.85),
        legend.text = element_text(size=13),
        legend.title = element_text(size=13),
        legend.background = element_blank())

# First 100 posterior draws of the mortality probability p
p1 <- p +
  geom_point(data = p_pred %>% filter(Iter < 101),
             aes(x, p, color=Country), alpha = .2) 
  
# Mean (line) and 90% HPDI (shaded region) of the mortality probability p
p2 <- p +
  geom_ribbon(data = p_mean_df,
              aes(x = x, ymin = hpdi_l, ymax = hpdi_h, fill=Country),
              alpha = .2) +
  geom_line(data = p_mean_df,
              aes(x=x, y=p_mean,col=Country))

p1_p2 <- ggarrange(p1, p2 + labs(y=""), nrow = 1)

# Title
text_title <- df %>% filter(method_snp_set_code == method_snp_set_code_i) %>% .[["method_snp_set_name"]] %>% unique()

title <- ggdraw() + 
  draw_label(text_title,
             fontface = 'bold',
             x = 0, hjust = 0) +   
  theme(plot.margin = margin(0, 0, 0, 7))

# merge title and plots
p1_p2 <- plot_grid(title, p1_p2, ncol = 1, rel_heights = c(0.1, 1))




ggsave(p1_p2,
       file=here(paste0("figs/ValidationNFI/MortalityProbabilityPredictions_UncertaintyIntervals/M62_",method_snp_set_code_i,".pdf")),
       device="pdf",
       height=6,
       width=11)

# Run these rows if you want to build a figure with p2 plots for different method/SNP sets.
# p2 <- p2 + labs(subtitle = text_title)
# return(p2)

})

# p <- plot_grid(myplots[[1]] + labs(x="") + theme(legend.position = "none"),
#                myplots[[2]] + labs(y="",x="") + theme(legend.position = "none"),
#                myplots[[3]] + labs(y="",x="") + theme(legend.position = c(0.8,0.85)),
#                myplots[[4]] + labs(x="") + theme(legend.position = "none"),
#                myplots[[5]] + labs(y="",x="") + theme(legend.position = "none"),
#                myplots[[6]] + labs(y="",x="") + theme(legend.position = "none"),
#                myplots[[7]] + theme(legend.position = "none"),
#                myplots[[8]] + labs(y="") + theme(legend.position = "none"),
#                myplots[[9]] + labs(y="") + theme(legend.position = "none"),
#                ncol=3)
# 
# ggsave(p,
#        file=here(paste0("figs/ValidationNFI/MortalityProbabilityPredictions_M62.pdf")),
#        device="pdf",
#        height=12,
#        width=12)
# 
# p

Table

In the table below:

  • mean and 90% credible intervals (highest posterior density intervals) of \(p\) for \(x=\{-1, 0, 1\}\).

  • percent change in \(p\) associated with a one-unit increase in \(x\) (which corresponds to a one-standard deviation increase).

Code
p_pred <- lapply(unique(df$method_snp_set_code), function(method_snp_set_code_i){

# We load the stanlist
stanlist <- readRDS(file = here(paste0("outputs/ValidationNFI/StanModels/M62_",method_snp_set_code_i,".rds")))$stanlist

# We load the model
mod <- readRDS(file = here(paste0("outputs/ValidationNFI/StanModels/M62_",method_snp_set_code_i,".rds")))$mod

# We extract the posterior distributions of all parameters
post <- as.data.frame(mod)

# Vector of predictor values (based on the min and max of the GO predictions)
x_vec <- c(-1,0,1)

# Function to predict the mortality probability p with the initial tree height fixed to the mean
p_pred <- lapply(c("alpha_country[1]","alpha_country[2]"), function(country_i){
  
  f_p <- function(x) 1 -(exp(-exp(post[,country_i] + 
                                    post$`beta_DBH` * mean(stanlist$DBH) + 
                                    post$`beta_C` * mean(stanlist$C) + 
                                    post$`beta_C_DBH` * mean(stanlist$C) * mean(stanlist$DBH) + 
                                    post$`beta_GO` * x +
                                    log(5))))
  
sapply(x_vec, f_p) %>% 
    tibble::as_tibble() %>%
    rename_all(function(x) x_vec) %>%
    mutate(Iter = row_number()) %>%
    gather(x, p, -Iter) %>%
    group_by(x) %>%
    mutate(hpdi_l = HDInterval::hdi(p, credMass = 0.90)[1],
           hpdi_h = HDInterval::hdi(p, credMass = 0.90)[2],
           p_mean = mean(p)) %>%
    ungroup() %>%
    mutate(x = as.numeric(x),
           Country=ifelse(country_i=="alpha_country[1]","Spain","France"))
  
}) %>%  
  bind_rows() %>% 
  dplyr::select(x,p_mean,hpdi_l,hpdi_h,Country) %>% 
  distinct() %>% 
  mutate_if(is.numeric, round,  digits=3)

lapply(unique(p_pred$Country), function(country_i){
  
p_pred_sub <- p_pred %>% dplyr::filter(Country==country_i)

delta_p_pos <- (p_pred_sub$p_mean[p_pred_sub$x==1] * 100 / p_pred_sub$p_mean[p_pred_sub$x==0])-100
delta_p_neg <- (p_pred_sub$p_mean[p_pred_sub$x==-1] * 100 / p_pred_sub$p_mean[p_pred_sub$x==0])-100

tibble("Method" = df %>% filter(method_snp_set_code == method_snp_set_code_i) %>% .[["method_snp_set_name"]] %>% unique(),
       "Country"=country_i,
         "p(X=-1)"=paste0(p_pred_sub$p_mean[p_pred_sub$x==-1], " [",
                       p_pred_sub$hpdi_l[p_pred_sub$x==-1],";",
                       p_pred_sub$hpdi_h[p_pred_sub$x==-1],"]"),
       "p(X=0)"=paste0(p_pred_sub$p_mean[p_pred_sub$x==0], " [",
                       p_pred_sub$hpdi_l[p_pred_sub$x==0],";",
                       p_pred_sub$hpdi_h[p_pred_sub$x==0],"]"),
       "p(X=1)"=paste0(p_pred_sub$p_mean[p_pred_sub$x==1], " [",
                       p_pred_sub$hpdi_l[p_pred_sub$x==1],";",
                       p_pred_sub$hpdi_h[p_pred_sub$x==1],"]"),
       "+Δp"=paste0(round(delta_p_pos,3),"%"),
       "-Δp"=paste0(round(delta_p_neg,3),"%"))
  
}) %>% bind_rows()

})  %>% bind_rows()

p_pred %>% saveRDS(file = here("tables/MortalityPredNFI.rds"))
Code
readRDS(file = here("tables/MortalityPredNFI.rds"))  %>% kable_mydf()
Method Country p(X=-1) p(X=0) p(X=1) +Δp -Δp
GDM - All candidate SNPs (380) Spain 0.099 [0.098;0.101] 0.113 [0.111;0.114] 0.128 [0.126;0.13] 13.274% -12.389%
GDM - All candidate SNPs (380) France 0.043 [0.041;0.045] 0.049 [0.047;0.051] 0.056 [0.053;0.058] 14.286% -12.245%
GF - All candidate SNPs (380) Spain 0.115 [0.114;0.117] 0.111 [0.109;0.112] 0.106 [0.104;0.109] -4.505% 3.604%
GF - All candidate SNPs (380) France 0.054 [0.051;0.056] 0.052 [0.049;0.054] 0.049 [0.047;0.052] -5.769% 3.846%
LFMM - All candidate SNPs (380) Spain 0.131 [0.129;0.133] 0.111 [0.11;0.113] 0.095 [0.093;0.096] -14.414% 18.018%
LFMM - All candidate SNPs (380) France 0.059 [0.056;0.062] 0.05 [0.048;0.052] 0.042 [0.04;0.044] -16% 18%
pRDA - All candidate SNPs (380) Spain 0.094 [0.092;0.095] 0.119 [0.117;0.12] 0.15 [0.147;0.152] 26.05% -21.008%
pRDA - All candidate SNPs (380) France 0.033 [0.031;0.035] 0.042 [0.04;0.044] 0.054 [0.052;0.057] 28.571% -21.429%
RDA - All candidate SNPs (380) Spain 0.081 [0.08;0.082] 0.132 [0.131;0.133] 0.211 [0.207;0.215] 59.848% -38.636%
RDA - All candidate SNPs (380) France 0.018 [0.017;0.019] 0.03 [0.028;0.031] 0.049 [0.047;0.052] 63.333% -40%
GDM - Common candidate SNPs (69) Spain 0.114 [0.113;0.116] 0.112 [0.111;0.114] 0.11 [0.108;0.112] -1.786% 1.786%
GDM - Common candidate SNPs (69) France 0.051 [0.049;0.054] 0.05 [0.048;0.053] 0.05 [0.047;0.052] 0% 2%
GF - Common candidate SNPs (69) Spain 0.115 [0.114;0.117] 0.111 [0.11;0.112] 0.107 [0.105;0.109] -3.604% 3.604%
GF - Common candidate SNPs (69) France 0.054 [0.051;0.056] 0.052 [0.049;0.054] 0.05 [0.047;0.052] -3.846% 3.846%
LFMM - Common candidate SNPs (69) Spain 0.146 [0.144;0.148] 0.109 [0.108;0.11] 0.081 [0.079;0.083] -25.688% 33.945%
LFMM - Common candidate SNPs (69) France 0.067 [0.064;0.07] 0.049 [0.047;0.051] 0.036 [0.035;0.038] -26.531% 36.735%
pRDA - Common candidate SNPs (69) Spain 0.082 [0.081;0.084] 0.114 [0.113;0.116] 0.157 [0.155;0.16] 37.719% -28.07%
pRDA - Common candidate SNPs (69) France 0.032 [0.03;0.033] 0.045 [0.042;0.046] 0.062 [0.059;0.065] 37.778% -28.889%
RDA - Common candidate SNPs (69) Spain 0.077 [0.076;0.078] 0.119 [0.118;0.121] 0.183 [0.181;0.186] 53.782% -35.294%
RDA - Common candidate SNPs (69) France 0.023 [0.022;0.024] 0.037 [0.035;0.038] 0.058 [0.055;0.06] 56.757% -37.838%
GDM - Candidate SNPs considering pop. struct. correction (221) Spain 0.16 [0.158;0.162] 0.111 [0.11;0.112] 0.077 [0.075;0.078] -30.631% 44.144%
GDM - Candidate SNPs considering pop. struct. correction (221) France 0.065 [0.062;0.068] 0.044 [0.042;0.047] 0.03 [0.029;0.032] -31.818% 47.727%
GF - Candidate SNPs considering pop. struct. correction (221) Spain 0.132 [0.13;0.134] 0.106 [0.104;0.107] 0.084 [0.083;0.086] -20.755% 24.528%
GF - Candidate SNPs considering pop. struct. correction (221) France 0.067 [0.064;0.071] 0.053 [0.051;0.056] 0.042 [0.04;0.044] -20.755% 26.415%
LFMM - Candidate SNPs considering pop. struct. correction (221) Spain 0.132 [0.13;0.134] 0.112 [0.111;0.113] 0.095 [0.093;0.096] -15.179% 17.857%
LFMM - Candidate SNPs considering pop. struct. correction (221) France 0.059 [0.056;0.062] 0.05 [0.047;0.052] 0.042 [0.04;0.044] -16% 18%
pRDA - Candidate SNPs considering pop. struct. correction (221) Spain 0.09 [0.089;0.091] 0.118 [0.117;0.12] 0.155 [0.152;0.157] 31.356% -23.729%
pRDA - Candidate SNPs considering pop. struct. correction (221) France 0.031 [0.03;0.033] 0.042 [0.04;0.044] 0.055 [0.053;0.058] 30.952% -26.19%
RDA - Candidate SNPs considering pop. struct. correction (221) Spain 0.079 [0.078;0.08] 0.117 [0.116;0.119] 0.172 [0.17;0.175] 47.009% -32.479%
RDA - Candidate SNPs considering pop. struct. correction (221) France 0.026 [0.024;0.027] 0.039 [0.037;0.04] 0.058 [0.055;0.06] 48.718% -33.333%
GDM - Control SNPs unmatching allele frequencies (380) Spain 0.141 [0.139;0.143] 0.116 [0.115;0.117] 0.095 [0.094;0.097] -18.103% 21.552%
GDM - Control SNPs unmatching allele frequencies (380) France 0.055 [0.053;0.058] 0.045 [0.043;0.047] 0.037 [0.035;0.038] -17.778% 22.222%
GF - Control SNPs unmatching allele frequencies (380) Spain 0.123 [0.121;0.125] 0.11 [0.108;0.111] 0.098 [0.096;0.099] -10.909% 11.818%
GF - Control SNPs unmatching allele frequencies (380) France 0.059 [0.056;0.061] 0.052 [0.05;0.054] 0.046 [0.044;0.048] -11.538% 13.462%
LFMM - Control SNPs unmatching allele frequencies (380) Spain 0.12 [0.118;0.122] 0.113 [0.112;0.114] 0.107 [0.105;0.108] -5.31% 6.195%
LFMM - Control SNPs unmatching allele frequencies (380) France 0.053 [0.051;0.056] 0.05 [0.048;0.052] 0.047 [0.045;0.049] -6% 6%
pRDA - Control SNPs unmatching allele frequencies (380) Spain 0.093 [0.091;0.094] 0.113 [0.112;0.114] 0.137 [0.135;0.139] 21.239% -17.699%
pRDA - Control SNPs unmatching allele frequencies (380) France 0.038 [0.037;0.04] 0.047 [0.045;0.049] 0.058 [0.055;0.06] 23.404% -19.149%
RDA - Control SNPs unmatching allele frequencies (380) Spain 0.11 [0.109;0.112] 0.113 [0.111;0.114] 0.115 [0.114;0.117] 1.77% -2.655%
RDA - Control SNPs unmatching allele frequencies (380) France 0.049 [0.047;0.051] 0.05 [0.048;0.052] 0.051 [0.049;0.053] 2% -2%
GDM - Control SNPs matching allele frequencies (380) Spain 0.139 [0.136;0.141] 0.118 [0.117;0.119] 0.1 [0.099;0.101] -15.254% 17.797%
GDM - Control SNPs matching allele frequencies (380) France 0.053 [0.05;0.055] 0.045 [0.042;0.046] 0.038 [0.036;0.039] -15.556% 17.778%
GF - Control SNPs matching allele frequencies (380) Spain 0.124 [0.122;0.125] 0.11 [0.108;0.111] 0.097 [0.096;0.099] -11.818% 12.727%
GF - Control SNPs matching allele frequencies (380) France 0.059 [0.056;0.061] 0.052 [0.049;0.054] 0.046 [0.044;0.048] -11.538% 13.462%
LFMM - Control SNPs matching allele frequencies (380) Spain 0.119 [0.117;0.121] 0.113 [0.112;0.114] 0.107 [0.105;0.109] -5.31% 5.31%
LFMM - Control SNPs matching allele frequencies (380) France 0.053 [0.05;0.055] 0.05 [0.048;0.052] 0.047 [0.045;0.049] -6% 6%
pRDA - Control SNPs matching allele frequencies (380) Spain 0.127 [0.125;0.129] 0.111 [0.11;0.113] 0.098 [0.096;0.099] -11.712% 14.414%
pRDA - Control SNPs matching allele frequencies (380) France 0.059 [0.056;0.061] 0.051 [0.049;0.053] 0.045 [0.043;0.047] -11.765% 15.686%
RDA - Control SNPs matching allele frequencies (380) Spain 0.11 [0.109;0.112] 0.113 [0.111;0.114] 0.115 [0.113;0.117] 1.77% -2.655%
RDA - Control SNPs matching allele frequencies (380) France 0.049 [0.047;0.051] 0.05 [0.048;0.052] 0.051 [0.049;0.053] 2% -2%
GDM - All SNPs (9817) Spain 0.15 [0.148;0.152] 0.118 [0.116;0.119] 0.092 [0.091;0.093] -22.034% 27.119%
GDM - All SNPs (9817) France 0.055 [0.053;0.058] 0.043 [0.041;0.045] 0.033 [0.032;0.035] -23.256% 27.907%
GF - All SNPs (9817) Spain 0.124 [0.122;0.125] 0.109 [0.108;0.11] 0.096 [0.094;0.098] -11.927% 13.761%
GF - All SNPs (9817) France 0.06 [0.057;0.062] 0.052 [0.05;0.055] 0.046 [0.044;0.048] -11.538% 15.385%
LFMM - All SNPs (9817) Spain 0.119 [0.117;0.121] 0.113 [0.111;0.114] 0.106 [0.105;0.108] -6.195% 5.31%
LFMM - All SNPs (9817) France 0.053 [0.051;0.056] 0.05 [0.048;0.052] 0.047 [0.045;0.049] -6% 6%
pRDA - All SNPs (9817) Spain 0.102 [0.1;0.103] 0.114 [0.113;0.115] 0.128 [0.126;0.13] 12.281% -10.526%
pRDA - All SNPs (9817) France 0.042 [0.04;0.044] 0.048 [0.046;0.05] 0.054 [0.051;0.056] 12.5% -12.5%
RDA - All SNPs (9817) Spain 0.106 [0.105;0.108] 0.114 [0.112;0.115] 0.121 [0.12;0.123] 6.14% -7.018%
RDA - All SNPs (9817) France 0.046 [0.043;0.048] 0.049 [0.046;0.051] 0.052 [0.05;0.055] 6.122% -6.122%
CD - Mean annual temperature (bio1, °C) Spain 0.155 [0.153;0.157] 0.098 [0.097;0.1] 0.062 [0.06;0.063] -36.735% 58.163%
CD - Mean annual temperature (bio1, °C) France 0.083 [0.079;0.087] 0.052 [0.049;0.054] 0.032 [0.031;0.034] -38.462% 59.615%
CD - Annual precipitation (bio12, mm) Spain 0.07 [0.069;0.071] 0.104 [0.103;0.106] 0.154 [0.152;0.156] 48.077% -32.692%
CD - Annual precipitation (bio12, mm) France 0.033 [0.031;0.034] 0.049 [0.047;0.051] 0.074 [0.071;0.077] 51.02% -32.653%
CD - Precipitation seasonality (bio15, index) Spain 0.142 [0.14;0.144] 0.103 [0.102;0.104] 0.074 [0.073;0.075] -28.155% 37.864%
CD - Precipitation seasonality (bio15, index) France 0.078 [0.075;0.082] 0.056 [0.054;0.059] 0.04 [0.038;0.042] -28.571% 39.286%
CD - Isothermality (bio3, index) Spain 0.074 [0.073;0.075] 0.11 [0.109;0.112] 0.163 [0.161;0.165] 48.182% -32.727%
CD - Isothermality (bio3, index) France 0.029 [0.027;0.03] 0.044 [0.042;0.046] 0.066 [0.063;0.069] 50% -34.091%
CD - Temperature seasonality (bio4, °C) Spain 0.122 [0.12;0.124] 0.102 [0.1;0.104] 0.085 [0.082;0.088] -16.667% 19.608%
CD - Temperature seasonality (bio4, °C) France 0.071 [0.067;0.076] 0.059 [0.056;0.062] 0.049 [0.047;0.051] -16.949% 20.339%
CD - Summer heat moisture index (SHM, °C/mm) Spain 0.133 [0.131;0.135] 0.111 [0.11;0.113] 0.093 [0.091;0.094] -16.216% 19.82%
CD - Summer heat moisture index (SHM, °C/mm) France 0.059 [0.056;0.061] 0.049 [0.047;0.051] 0.041 [0.039;0.043] -16.327% 20.408%
CD - All climatic variables (Euclidean distance) Spain 0.113 [0.111;0.115] 0.112 [0.111;0.113] 0.111 [0.109;0.113] -0.893% 0.893%
CD - All climatic variables (Euclidean distance) France 0.051 [0.049;0.054] 0.051 [0.048;0.053] 0.05 [0.048;0.052] -1.961% 0%

Model coefficients

Below, we plot the 95% credible intervals of:

  • the \(\beta_{X}\) coefficients, which stand for the association between mortality rates in NFI plots and the genomic offset predictions (i.e. capturing the potential maladaptation of the populations at the location of the NFI plots).

  • the \(\beta_C\) coefficients, which stands for the association between the tree basal area (all species confounded) in the NFI plots and the mortality rates.

  • the \(\beta_{DBH}\) coefficients, which stands for the association between the mean DBH (diameter at breast height) of all maritime pines in the NFI plots (including adults, dead trees and seedlings/saplings) and the mortality rates.

  • the \(\beta_{C*DBH}\) coefficients, which stands for the interaction between the mean DBH and the tree basal area.

Code
coefftab <- readRDS(file=here("outputs/ValidationNFI/ModelM62_coefftab.rds"))
  
coeff_match <- list(beta_GO=list("Regression coefficients $\\beta_{X}$ in mortality models",
                                 c(0.5,0.85),
                                 c(-0.6,1)), 
                    beta_C=list("Regression coefficients $\\beta_C$ in mortality models",
                                c(0.5,0.85),
                                c(-0.01,0.2)),
                    beta_DBH=list("Regression coefficients $\\beta_{DBH}$ in mortality models",
                                  c(0.5,0.85),
                                  c(-0.05,0.45)),
                    beta_C_DBH=list("Regression coefficients $\\beta_{C*DBH}$ in mortality models",
                                    c(0.5,0.85),
                                    c(-0.2,0.1)))

p <- lapply(c("beta_GO","beta_C","beta_DBH","beta_C_DBH"), function(coeff){
  
p <- coefftab %>% 
  filter(term==coeff) %>% 
  ggplot(aes(x = method,
               y = median,
               ymin = conf95_low, 
               ymax = conf95_high,
               color = snp_set_name,
               shape = snp_set_name)) +
  {if(coeff=="beta_GO")  geom_rect(inherit.aes=FALSE, aes(xmin=-Inf, xmax=Inf, ymin=0, ymax=Inf), color="transparent", fill="green", alpha=0.005)} + 
  geom_hline(yintercept = 0,color="gray") +
  geom_pointinterval(position = position_dodge(width = .4),point_size=3.5,size=3) +
  ylab(TeX(coeff_match[[coeff]][[1]])) + xlab("") +
  ylim(coeff_match[[coeff]][[3]]) +
  scale_color_manual(values=colors_coeff) +
  scale_shape_manual(values = c(rep(16,6),rep(17,7))) +
  theme_bw() +
  labs(color="",shape="") +
  theme(axis.text.x =  element_text(size=17),
        axis.text.y = element_text(size=17),
        axis.title.y = element_text(size=19),
        legend.title=element_blank(), 
        legend.box.background = element_rect(colour = "gray", linewidth=0.6),
        strip.text.x = element_text(size = 16),
        strip.background = element_blank(),
        legend.position = coeff_match[[coeff]][[2]],
        legend.text=element_text(size=11),
        panel.grid.minor.x=element_blank(),
        panel.grid.major.x=element_blank()) +
  guides(color=guide_legend(ncol=2),
         shape = guide_legend(override.aes = list(size =2 )))

# save in pdf and png
ggsave(p,file=here(paste0("figs/ValidationNFI/ModelM62_",coeff,"_IntervalPlots.pdf")),
       device="pdf",
       height=8.8,
       width=9)

return(p)

})
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = 0, : All aesthetics have length 1, but the data has 37 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Code
p
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = 0, : All aesthetics have length 1, but the data has 37 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.

\(\beta_{GO}\) estimates against a null distribution

We evaluated the probability that the estimated relationship between the mortality rates and GO predictions could be obtained by chance. For that, we ran again the model on the NFI data but we replaced GO predictions by a randomly generated variable \(X\) such as \(X \sim \mathcal{N}(0,1)\). We repeated this step 100 times and obtained a distribution of \(\beta_X\) coefficients capturing the relationship between mortality rates and a randomly generated variable. We then compared this distribution of \(\beta_X\) coefficients to the estimated \(\beta_{GO}\) coefficients capturing the relationship between the mortality rates and the GO predictions.

Code
# Load the coefficient values of models with predicted GO
coefftab <- readRDS(file=here("outputs/ValidationNFI/ModelM62_coefftab.rds")) %>% 
  mutate(run_ID = method_snp_set_name,
         GO = "predicted") %>% 
  dplyr::filter(term == "beta_GO") %>%
  dplyr::select(run_ID, GO, mean, contains("conf"))

# Load the coefficient values of models with random GO and merge with the predicted GO coefficient table
coefftab <- readRDS(file=here(paste0("outputs/ValidationNFI/Simulations/100simulations_m6_2_truedata_randomGO.rds"))) %>% 
  bind_rows(.id = "sim_ID") %>% 
  dplyr::filter(term == "beta_GO") %>%
  mutate(run_ID = paste0("sim ",sim_ID),#run_ID = paste0("Simulation ",sim_ID),
         GO = "random") %>% 
  dplyr::select(run_ID, GO, mean, contains("conf")) %>% 
  bind_rows(coefftab)
  
# Plot the mean and 95% credible intervals with interval plots
p <- coefftab %>% 
  ggplot(aes(mean, reorder(run_ID, mean))) +
  geom_vline(xintercept = 0, color="gray30") +
  geom_errorbar(aes(xmin = conf95_low, xmax = conf95_high, color=GO)) +
  scale_color_manual(values=c("magenta","gray"),
                     name = "Predictor") +
  geom_point() +
  theme_bw() +
  labs(x="Predictor effect size", y="") +
  theme(axis.text.y = element_text(size=6),
        legend.box.background = element_rect(colour = "gray", linewidth=0.6),
        legend.position = c(0.9,0.1),
        legend.text=element_text(size=10))

# save the plot in pdf
ggsave(p,
       file=here(paste0("figs/ValidationNFI/ModelM62_IntervalPlots_RandomGO_vs_PredictedGO.pdf")),
       device="pdf",
       height=12,
       width=10)

p

Mapping model predictions

We add in the Stan code of the model the estimation of:

  • \(p\) the probability of mortality during the survey period in each plot (p in the Stan code).

  • \(p_{annual}\) the annual probability of mortality in each plot (p_annual in the Stan code).

  • \(p_{annual,go}\) the annual probability of mortality in each plot with \(DBH\) and \(C\) fixed to the mean and country fixed for Spain (p_annual_go in the Stan code).

  • \(m_{pred}\) the predicted number of dead trees in each plot during the survey period (nb_dead_pred in the Stan code).

  • \(r_{pred}\) the residuals (in each plot during the survey period): the observed number of dead trees \(m_i\) minus the predicted number of dead trees \(m_{pred}\) (nb_dead_res in the Stan code).

Code
# Stan code
stancode = stan_model(here("scripts/StanModels/ValidationNFI_m6_2_with_predictions.stan"))
print(stancode)
S4 class stanmodel 'anon_model' coded as follows:
data {
  int N;
  vector[N] log_nb_years;                                                           // Offset to account for different census intervals
  vector[N] C;                                                                      // Proxy of the competition among trees (i.e. basal area)
  vector[N] DBH;                                                                    // Proxy of the average tree age (i.e. mean DBH)
  vector[N] GO;                                                                     // Genomic offset
  int nb_dead[N];                                                                   // Number of dead trees in the plot
  int nb_tot[N];                                                                    // Total number of trees in the plot
  int<lower=0> nb_country;                                                          // Number of countries
  int<lower=0, upper=nb_country> country[N];                                        // Countries
}
parameters {
  vector[nb_country] alpha_country;
  real beta_GO;
  real beta_C;
  real beta_DBH;
  real beta_C_DBH;
}
transformed parameters {
  vector[N] p;                               // probability of mortality during the inventory period of each plot
  vector[N] p_annual;                        // annual probability of mortality
  vector[N] p_annual_go;                     // annual probability of mortality with all predictors except GO fixed

  p = inv_cloglog(alpha_country[country] + beta_GO * GO + beta_C * C  + beta_DBH * DBH + beta_C_DBH * C .* DBH + log_nb_years);
  p_annual = inv_cloglog(alpha_country[country] + beta_GO * GO + beta_C * C  + beta_DBH * DBH + beta_C_DBH * C .* DBH + 1);
  p_annual_go  = inv_cloglog(alpha_country[1] + beta_GO * GO + beta_C * mean(C)  + beta_DBH * mean(DBH) + beta_C_DBH * mean(C) .* mean(DBH) + 1);
}
model {
  nb_dead ~ binomial(nb_tot,p);

  alpha_country ~ normal(0, 1);
  beta_GO ~ normal(0, 1);
  beta_C ~ normal(0, 1);
  beta_DBH ~ normal(0, 1);
  beta_C_DBH ~ normal(0, 1);
}
generated quantities{
  vector[N] nb_dead_pred;
  vector[N] nb_dead_res;
  for (n in 1:N) {
    nb_dead_pred[n] = binomial_rng(nb_tot[n], p[n]);
    nb_dead_res[n] = nb_dead[n] - nb_dead_pred[n];
  }
} 

We estimate these parameters for genomic offset predictions with GF and the sets with all candidate SNPs.

Code
method_snp_set_code_i <- "RDA_all_cand"
Code
params_to_estimate <- c("p", "p_annual", "p_annual_go", "nb_dead_pred", "nb_dead_res")

# Subset the data - keeping only one set of method / SNP set
sub_data <- df %>% 
  filter(method_snp_set_code == method_snp_set_code_i) %>% 
  inner_join(nfi_data, by="plotcode")
      
saveRDS(sub_data, file = here(paste0("outputs/ValidationNFI/StanModels/M62_with_predictions/subdata_M62_with_predictions_",method_snp_set_code_i,".rds")))

# Data in a list for Stan 
stanlist <- list(N = nrow(sub_data),
                     nb_dead = sub_data$nb_dead,
                     nb_tot=sub_data$nb_tot,
                     GO = (sub_data$GO - mean(sub_data$GO))/sd(sub_data$GO),
                     C = (sub_data$basal_area - mean(sub_data$basal_area))/sd(sub_data$basal_area),
                     DBH = (sub_data$mean_DBH - mean(sub_data$mean_DBH))/sd(sub_data$mean_DBH),
                     nb_country=length(unique(sub_data$country)),
                     country=as.numeric(as.factor(sub_data$country)),
                     log_nb_years=log(sub_data$nb_years))

saveRDS(stanlist, file = here(paste0("outputs/ValidationNFI/StanModels/M62_with_predictions/stanlist_M62_with_predictions_",method_snp_set_code_i,".rds")))

# Running the Stan model
sampling(stancode, data = stanlist, iter = 1500, chains = 4, cores = 4,init=0, save_warmup = FALSE, include = TRUE, pars=params_to_estimate) %>%
  broom.mixed::tidyMCMC(pars=params_to_estimate,
                             droppars = NULL, 
                             robust = TRUE,  # give median and mean absolute deviation (= avg distance btw each data point and the mean)
                             ess = T, # effective sample size estimates
                             rhat = T, # Rhat estimates
                             conf.int = T, 
                             conf.level = 0.95 # include 95% credible intervals
                        ) %>% 
  saveRDS(file=here(paste0("outputs/ValidationNFI/StanModels/M62_with_predictions/coefftab_",method_snp_set_code_i,".rds")))

We map the predictions.

Code
# Package for mapping
library(rnaturalearth)
world <- ne_countries(scale = "medium", returnclass = "sf")

# Function for mapping
map_pred_nfi <- function(x, # parameter to plot 
                         ggtitle = NULL, 
                         legendtitle = "Annual probability\nof tree mortality",
                         plots_above, # Which plots should be overlaid on top to be sure of seeing them?
                         method_snp_set_code,
                         unbalanced_rainbow){

coeffs <- readRDS(file=here(paste0("outputs/ValidationNFI/StanModels/M62_with_predictions/coefftab_",method_snp_set_code,".rds"))) %>% 
  filter(str_detect(term, paste0("^", x, "\\[")))
  
data <- readRDS(file=here(paste0("outputs/ValidationNFI/StanModels/M62_with_predictions/subdata_M62_with_predictions_",method_snp_set_code,".rds"))) %>% 
  dplyr::select(plotcode, longitude, latitude) %>% 
  bind_cols(coeffs) 

p <- ggplot() + 
    geom_sf(data = world, fill="gray98") + 
    theme_bw() +
    scale_x_continuous(limits = c(-10, 11)) +
    scale_y_continuous(limits = c(35, 51)) + 
    geom_point(data=data, aes(x=longitude,y=latitude,color=estimate), size=1, alpha = 0.8) + 
  xlab("") + ylab("") +
    ggtitle(ggtitle) +
    theme(legend.position = c(0.85,0.2),
          axis.text = element_text(size=14),
          legend.box.background = element_rect(fill = "white", linewidth=0.6, colour="gray"))

{ 
  if(x == "nb_dead_res") {
    
    p <- p + 
      geom_point(data=data %>% filter(estimate > plots_above[[1]]), aes(x=longitude,y=latitude,color=estimate), size=1.4) +
      geom_point(data=data %>% filter(estimate < plots_above[[2]]), aes(x=longitude,y=latitude,color=estimate), size=1.4) +
      geom_point(data=data %>% filter(estimate > plots_above[[3]]), aes(x=longitude,y=latitude,color=estimate), size=1.4) + 
      geom_point(data=data %>% filter(estimate < plots_above[[4]]), aes(x=longitude,y=latitude,color=estimate), size=1.4) + 
      scale_color_stepsn(name = legendtitle, colours = rainbow(8), limits=NULL, breaks=c(-20,-10,0,10,20,30,40)) 
    
      } else {
        
        p <-  p + 
          geom_point(data=data %>% filter(estimate>plots_above[[1]]), aes(x=longitude,y=latitude,color=estimate), size=1.4) +
          geom_point(data=data %>% filter(estimate>plots_above[[2]]), aes(x=longitude,y=latitude,color=estimate), size=1.4) +
          scale_color_gradientn(name = legendtitle, colours = unbalanced_rainbow, limits=NULL)
        } 
      } 
}

Probability of mortality \(p\) during the survey period in each plot.

Code
p <- map_pred_nfi(x = "p", 
                  ggtitle = NULL, 
                  legendtitle = "Probability of tree\nmortality during\nthe inventory period",
                  plots_above = c(0.2,0.4),
                  unbalanced_rainbow = c("#00FF66", "#0066FF",rep("#CCFF00",2),rep("orange",3),rep("#FF0000",9)),
                  method_snp_set_code = method_snp_set_code_i)

ggsave(plot = p, filename = here(paste0("figs/ValidationNFI/ModelM62_PredictedMortalityProbabilityInventoryPeriod_Map_",method_snp_set_code_i,".pdf")), height = 6, width = 6)

p

Annual probability \(p_{annual}\) of tree mortality

Code
p <- map_pred_nfi(x = "p_annual", 
                  ggtitle = NULL, 
                  legendtitle = "Annual probability\nof tree mortality",
                  plots_above = c(0.1,0.2),
                  unbalanced_rainbow = c("#00FF66", "#0066FF",rep("#CCFF00",2),rep("orange",3),rep("#FF0000",9)),
                  method_snp_set_code = method_snp_set_code_i)

ggsave(plot = p, filename = here(paste0("figs/ValidationNFI/ModelM62_PredictedAnnualMortalityProbability_Map_",method_snp_set_code_i,".pdf")), height = 6, width = 6)

p

Annual probability \(p_{annual,go}\) of tree mortality in each plot with \(DBH\) and \(C\) fixed to the mean and country fixed for Spain.

Code
p <- map_pred_nfi(x = "p_annual_go", 
                  ggtitle = NULL, 
                  legendtitle = "Annual probability\nof tree mortality",
                  plots_above = c(0.065,0.08),
                  unbalanced_rainbow = c("#00FF66", "#0066FF",rep("#CCFF00",2),rep("orange",3),rep("#FF0000",5)),
                  method_snp_set_code = method_snp_set_code_i)

ggsave(plot = p, filename = here(paste0("figs/ValidationNFI/ModelM62_PredictedAnnualMortalityProbability_varyingGO_OtherParamsFixed_Map_",method_snp_set_code_i,".pdf")), height = 6, width = 6)

p

\(m_{pred}\) the predicted number of dead trees in each plot during the survey period.

Code
p <- map_pred_nfi(x = "nb_dead_pred", 
                  ggtitle = NULL, 
                  legendtitle = "Predicted number\nof dead trees",
                  unbalanced_rainbow = c("#00FF66", "#0066FF",rep("#CCFF00",2),rep("orange",3),rep("#FF0000",9)),
                  plots_above = c(8,14),
                  method_snp_set_code = method_snp_set_code_i)

ggsave(plot = p, filename = here(paste0("figs/ValidationNFI/ModelM62_PredictedNumberDeadTrees_Map_",method_snp_set_code_i,".pdf")), height = 6, width = 6)

p

\(r_{pred}\) the residuals (in each plot during the survey period): the observed number of dead trees \(m_i\) minus the predicted number of dead trees \(m_{pred}\).

Code
p <- map_pred_nfi(x = "nb_dead_res", 
                  ggtitle = NULL, 
                  legendtitle = "Residuals",
                  unbalanced_rainbow = NULL,
                  plots_above = c(10,-10,20,-20),
                  method_snp_set_code = method_snp_set_code_i)

ggsave(plot = p, filename = here(paste0("figs/ValidationNFI/ModelM62_Residuals_Map_",method_snp_set_code_i,".pdf")), height = 6, width = 6)

p

Correlation coefficients

We estimate correlations between genomic offset predictions and the observed annual mortality rates, after correction for confounding factors.

We estimate two types of correlations:

  • between genomic offset values and the observed annual mortality rates (after correction)

  • between the plot ranks based on the genomic offset predictions and the plot ranks based on the observed annual mortality rates (after correction).

Code
# Stan code
stancode = stan_model(here("scripts/StanModels/ValidationNFI_m6_2_with_predictions_withoutGO.stan"))
print(stancode)
S4 class stanmodel 'anon_model' coded as follows:
data {
  int N;
  vector[N] log_nb_years;                                                           // Offset to account for different census intervals
  vector[N] C;                                                                      // Proxy of the competition among trees (i.e. basal area)
  vector[N] DBH;                                                                    // Proxy of the average tree age (i.e. mean DBH)
  int nb_dead[N];                                                                   // Number of dead trees in the plot
  int nb_tot[N];                                                                    // Total number of trees in the plot
  int<lower=0> nb_country;                                                          // Number of countries
  int<lower=0, upper=nb_country> country[N];                                        // Countries
}
parameters {
  vector[nb_country] alpha_country;
  real beta_C;
  real beta_DBH;
  real beta_C_DBH;
}
transformed parameters {
  vector[N] p;                               // probability of mortality during the inventory period of each plot
  vector[N] p_annual;                        // annual probability of mortality that can be explained
  //vector[N] res_mortality;                   // residual mortality

  p = inv_cloglog(alpha_country[country] + beta_C * C  + beta_DBH * DBH + beta_C_DBH * C .* DBH + log_nb_years);
  p_annual = inv_cloglog(alpha_country[country] + beta_C * C  + beta_DBH * DBH + beta_C_DBH * C .* DBH + log(1));
  //res_mortality = (nb_dead/nb_tot)/exp(log_nb_years) - p_annual;
  }
model {
  nb_dead ~ binomial(nb_tot,p);

  alpha_country ~ normal(0, 1);
  beta_C ~ normal(0, 1);
  beta_DBH ~ normal(0, 1);
  beta_C_DBH ~ normal(0, 1);
} 
Code
# Data in a list for Stan 
stanlist <- list(N = nrow(nfi_data),
                 nb_dead = nfi_data$nb_dead,
                 nb_tot=nfi_data$nb_tot,
                 C = (nfi_data$basal_area - mean(nfi_data$basal_area))/sd(nfi_data$basal_area),
                 DBH = (nfi_data$mean_DBH - mean(nfi_data$mean_DBH))/sd(nfi_data$mean_DBH),
                 nb_country=length(unique(nfi_data$country)),
                 country=as.numeric(as.factor(nfi_data$country)),
                 log_nb_years=log(nfi_data$nb_years))

       
# Running the Stan model
mod <- sampling(stancode, 
                data = stanlist, 
                iter = 2000, 
                chains = 4, 
                cores = 4,
                init=0, 
                save_warmup = FALSE,
                warmup = 1000, 
                verbose =T)

# Save the model
saveRDS(mod, file=here(paste0("outputs/ValidationNFI/StanModels/M62_WithoutGO.rds")))
Code
mod <- readRDS(file=here("outputs/ValidationNFI/StanModels/M62_WithoutGO.rds"))

params_to_estimate <- "p_annual"

# Model coefficients
coefftab <- broom.mixed::tidyMCMC(mod,
                                  pars=params_to_estimate,
                                  droppars = NULL, 
                                  robust = FALSE, # give mean and standard deviation
                                  ess = T, # effective sample size estimates
                                  rhat = T, # Rhat estimates
                                  conf.int = T, 
                                  conf.level = 0.95 # include 95% credible intervals
                                 ) %>% 
  dplyr::rename(conf95_low=conf.low,
                conf95_high=conf.high,
                mean=estimate,
                std_deviation=std.error)
  
coefftab <- broom.mixed::tidyMCMC(mod,
                                  pars=params_to_estimate,
                                  droppars = NULL, 
                                  robust = TRUE, # give median and mean absolute deviation (= avg distance btw each data point and the mean)
                                  ess = F, 
                                  rhat = F, 
                                  conf.int = T, 
                                  conf.level = 0.80 # include 80% credible intervals
                                  ) %>% 
  dplyr::rename(conf80_low=conf.low,
                conf80_high=conf.high,
                median=estimate,
                mean_absolute_deviation=std.error) %>%
    inner_join(coefftab,by=c("term"))
  
coefftab_nfi <- nfi_data %>% 
  bind_cols(coefftab) %>% 
  mutate(res_mortality_median = annual_prop_dead - median) %>% 
  arrange(res_mortality_median) %>% 
  mutate(rank_res_mortality_median = 1:nrow(.)) %>% 
  mutate(res_mortality_mean = annual_prop_dead - mean) %>% 
  arrange(res_mortality_mean) %>% 
  mutate(rank_res_mortality_mean = 1:nrow(.))

# save the table
saveRDS(coefftab_nfi, file=here(paste0("outputs/ValidationNFI/M62_CoeffTable_WithoutGO.rds")))
Code
coefftab_nfi <- readRDS(file=here(paste0("outputs/ValidationNFI/M62_CoeffTable_WithoutGO.rds")))

Correlation between the mean and median estimates of the annual probability of mortality that can be explained by the confounding factors (i.e. competition, mean DBH, or country):

Code
cor(coefftab_nfi$mean, coefftab_nfi$median)
[1] 0.9999993

Correlation between plot ranks of the observed annual mortality rates, after correcting for confounding factors, based on either the mean or the median:

Code
cor(coefftab_nfi$rank_res_mortality_mean,coefftab_nfi$rank_res_mortality_median)
[1] 0.9999997

Given the high correlations, we conclude that either the mean or the median can be used for calculating the correlations.

We calculate the correlations using the median estimates.

Code
corrtab <- lapply(unique(df$method_snp_set_code), function(method_snp_set_i){
  
  sub_data <- df %>% 
    filter(method_snp_set_code == method_snp_set_i) %>% 
    #arrange(GO) %>% 
    #mutate(GO_rank = 1:nrow(.)) %>% 
    inner_join(coefftab_nfi, by="plotcode")
  
  tibble(pearson_correlation = cor.test(sub_data$res_mortality_mean, sub_data$GO, method="pearson")$estimate[[1]],
         pearson_pvalue = cor.test(sub_data$res_mortality_mean, sub_data$GO, method="pearson")$p.value,
         spearman_correlation = cor.test(sub_data$res_mortality_mean, sub_data$GO, method="spearman")$estimate[[1]],
         spearman_pvalue = cor.test(sub_data$res_mortality_mean, sub_data$GO, method="spearman")$p.value,
         method_snp_set_code = method_snp_set_i,
         method_snp_set_name = unique(sub_data$method_snp_set_name),
         snp_set_name = unique(sub_data$snp_set_name),
         snp_set_code = unique(sub_data$snp_set_code),
         method = unique(sub_data$method))
}) %>% 
  bind_rows() 
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Warning in cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman"): Impossible de calculer la p-value exacte avec des ex-aequos
Code
corrtab <- corrtab %>% 
  mutate(snp_set_name = factor(snp_set_name, levels = c(unique(corrtab$snp_set_name)[[6]],
                                                        unique(corrtab$snp_set_name)[[4]],
                                                        unique(corrtab$snp_set_name)[[5]],
                                                        unique(corrtab$snp_set_name)[[3]],
                                                        unique(corrtab$snp_set_name)[[1]],
                                                        unique(corrtab$snp_set_name)[[2]],
                                                        unique(corrtab$snp_set_name)[[7]],
                                                        unique(corrtab$snp_set_name)[[8]],
                                                        unique(corrtab$snp_set_name)[[9]],
                                                        unique(corrtab$snp_set_name)[[10]],
                                                        unique(corrtab$snp_set_name)[[11]],
                                                        unique(corrtab$snp_set_name)[[12]],
                                                        unique(corrtab$snp_set_name)[[13]])),
          method = factor(method, levels = c(unique(corrtab$method)[[1]],
                                             unique(corrtab$method)[[2]],
                                             unique(corrtab$method)[[3]],
                                             unique(corrtab$method)[[4]],
                                             unique(corrtab$method)[[5]],
                                             unique(corrtab$method)[[6]]))) 

Comment: When estimating the Spearman correlations, we get the following message:

Message d'avis : Dans cor.test.default(sub_data$res_mortality_mean, sub_data$GO, method = "spearman") : Impossible de calculer la p-value exacte avec des ex-aequos

It indicates that the function cannot compute the exact p-value due to the presence of ties (i.e., identical values) in the data. This occurs because the Spearman correlation method is based on rank-ordering the data, and if two or more values are the same, it’s unclear how to rank them uniquely. Spearman’s correlation, being a non-parametric test, works by converting the data to ranks and then calculating the correlation between these ranks. When there are tied values (ex-aequos), the function adjusts for this, but the calculation of an exact p-value is more complex, so R resorts to an approximation. In practical terms, the result is still valid, but the p-value is an approximation rather than the exact value. In our case, the number of ties is small (119 ties out of a total of 11,917 data points, where these 119 ties represent 47 unique values), this approximation should be close to the true value.

We plot the correlations.

Code
correlation_types <- c("pearson_correlation", "spearman_correlation")

lapply(correlation_types, function(coeff){
  
  p <- corrtab %>% 
    pivot_longer(values_to = "correlation_estimate", names_to = "correlation_type", cols = all_of(correlation_types)) %>% 
    mutate(correlation_pvalue = ifelse(correlation_type == "pearson_correlation", pearson_pvalue, spearman_pvalue)) %>% 
    dplyr::select(-pearson_pvalue,-spearman_pvalue) %>% 
    mutate(pvalue_signi = ifelse(correlation_pvalue < 0.05, "p-value < 0.05", "p-value ≥ 0.05")) %>% 
    filter(correlation_type == coeff) %>% 
    ggplot(aes(x = method,
               y = correlation_estimate,
               color = snp_set_name,
               shape = pvalue_signi)) +
    geom_rect(inherit.aes=FALSE, aes(xmin=-Inf, xmax=Inf, ymin=0, ymax=Inf), color="transparent", fill="green", alpha=0.005) + 
    geom_hline(yintercept = 0,color="gray") +
    geom_point(position = position_dodge(width = .4), size=3) +
    {if(coeff=="pearson_correlation") ylab("Pearson correlation estimates")} + 
    {if(coeff=="spearman_correlation") ylab("Spearman correlation estimates (rank correlation)") } + 
    xlab("") +
    ylim(c(-0.5,0.5)) +
    scale_color_manual(values=colors_coeff) +
    scale_shape_manual(values = c(16,18)) + #, name="p-value of the correlations"
    theme_bw() +
    labs(color="",shape="") +
    theme(axis.text.x =  element_text(size=13),
          axis.text.y = element_text(size=13),
          axis.title.y = element_text(size=16),
          axis.title.x = element_text(size=1),
          legend.title=element_text(size=10), 
          #legend.box.background = element_rect(colour = "gray", linewidth=0.6),
          strip.text.x = element_text(size = 16),
          strip.background = element_blank(),
          #legend.position = c(0.63, 0.85),
          legend.text=element_text(size=9),
          panel.grid.minor.x=element_blank(),
          panel.grid.major.x=element_blank()) +
    guides(color=guide_legend(ncol=1),
           shape = guide_legend(override.aes = list(size =2 )))

# save in pdf and png
ggsave(p,file=here(paste0("figs/ValidationNFI/ModelM62_",coeff,"_CorrelationPlots.pdf")),
       device="pdf",
       height=7,
       width=12)

p

})
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = 0, : All aesthetics have length 1, but the data has 37 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, : for 'p-value ≥ 0.05' in 'mbcsToSbcs': >= substituted for ≥ (U+2265)
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = 0, : All aesthetics have length 1, but the data has 37 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, : for 'p-value ≥ 0.05' in 'mbcsToSbcs': >= substituted for ≥ (U+2265)
[[1]]
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = 0, : All aesthetics have length 1, but the data has 37 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.


[[2]]
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = 0, : All aesthetics have length 1, but the data has 37 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.

GO predictions in Galicia

GO predictions in Galicia (especially those obtained with GF) show a strong boundary between two zones (see Figure 4b of the manuscript). How can we explain this pattern?

Code
nfi_data %>% 
  mutate(year_second_survey = ifelse(second_survey > 1999, "After 2000",second_survey)) %>% 
           ggplot() + 
    geom_sf(data = world, fill="gray98") + 
    theme_bw() +
    scale_x_continuous(limits = c(-10, 12)) +
    scale_y_continuous(limits = c(35, 51)) + 
    geom_point( aes(x=longitude,
                    y=latitude,
                    color=factor(year_second_survey)),
               alpha=0.4, size=0.5) + 
    xlab("") + ylab("") +
    ggtitle("") +
    scale_color_manual(name = "Year of the second survey", 
                       values =  c("yellow2","darkorchid","#60F2E7","#C1E5AC")) +
    theme(legend.box.background = element_rect(colour = "gray80", linewidth=0.6)) +
   guides(colour = guide_legend(override.aes = list(size=2)))

In Galicia, GO predictions from the GF approach (see Figure 4b of the manuscript) are higher for plots surveyed for the second time in 1998 than plots surveyed for the second time in 1997.

We explore the climatic differences between the two sets of plots, ie plots surveyed in 1997 vs 1998. First, we extract the annual climatic values at the location of the Galician plots sampled in 1997 and 1998 and plot their values across the study period (ie between the first and second surveys).

Code
# Extracting annual climatic values at the location of the NFI plots in Galicia surveyed for the second time inm 1997 or 1998
subclim <- clim %>% 
  left_join(nfi_data, by=c("plotcode","longitude","latitude")) %>% 
    dplyr::filter(second_survey %in% 1997:1998) %>% 
    dplyr::filter(year %in% min(first_survey):1998) %>% 
    dplyr::select(second_survey,year,all_of(clim_var))


# violin_plots <- lapply(clim_var, function(x) {
#   
# var_name <- extract_climatedt_metadata(var_clim = x) %>% 
#   mutate(var_legend= paste0(description, " (", label,"; ",unit_symbol,")")) %>% 
#   pull(var_legend)
# 
# subclim %>% 
#   mutate(year=as.factor(year)) %>% 
#   ggplot(aes_string(y=x,x="year")) + 
#   geom_jitter(shape=16,aes(colour=factor(second_survey)),position=position_jitter(0.2),size=2,alpha=0.3) +
#   xlab("") + 
#   scale_color_manual(name="Year of the second survey",
#                      values = c("yellow2","darkorchid")) +
#   ggtitle(var_name) +
#   ylab("mean climate under survey period - mean past climate") +
#   theme_bw()
# 
# })

violin_plots <- lapply(clim_var, function(x) {
  
var_name <- extract_climatedt_metadata(var_clim = x) %>% 
  mutate(var_legend= paste0(description, " (", label,"; ",unit_symbol,")")) %>% 
  pull(var_legend)

subclim %>% 
  mutate(year=as.factor(year),
         second_survey=as.factor(second_survey)) %>% 
  ggplot(aes_string(y=x,x="year",color="second_survey")) + 
  geom_point(shape=16,position=position_jitter(0.2),size=2,alpha=0.1) +
  geom_violin(alpha=0.3,fill="white", linewidth=0.7) + 
  xlab("") + ylab("") +
  ggtitle(var_name) +
  scale_color_manual(name="Year of the second survey",
                     values = c("yellow2","darkorchid")) +
  theme_bw() +
  guides(colour = guide_legend(override.aes = list(size=2)))

})
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
Code
violin_plots 

From these plots, it seems that the plots sampled in 1997 have lower isothermality (bio3) and lower temperature seasonality (bio4).

We also compare the mean climatic values of the reference and survey periods between the plots surveyed in 1997 and 1998. For that, we calculate for each plot (and each climatic variable) the difference between its mean climate across the reference period (ie 1901-1950) and its mean climate across the survey period (between the first and second survey period).

Code
# We load the datasets with mean climatic values across the reference and survey periods
# Then we calculate the difference between the mean climates of the two periods for each plot
#  We then keep only plots surveyed (second survey) in 1997 or 1998 (Galician plots)
subnficlim <- readRDS(here("data/ClimaticData/NFIplots/NFIclimate.rds")) %>% 
  bind_rows() %>% 
  group_by(plotcode,longitude,latitude,elevation) %>% 
  summarise(across(tmn01:Eref, diff)) %>% 
  ungroup() %>% 
  left_join(nfi_data, by=c("plotcode","longitude","latitude")) %>%
  dplyr::filter(second_survey %in% 1997:1998)

bg_color <- "gray44" # background color

violin_plots <- lapply(clim_var, function(x) {
  
var_name <- extract_climatedt_metadata(var_clim = x) %>% 
  mutate(var_legend= paste0(description, " (", label,"; ",unit_symbol,")")) %>% 
  pull(var_legend)

subnficlim %>% 
  mutate(second_survey=as.factor(second_survey)) %>% 
  ggplot(aes_string(y=x,x="second_survey")) + 
  geom_violin(alpha=0.2, fill="gray76", color="darkgreen") + 
  geom_jitter(shape=16,position=position_jitter(0.2),size=2,alpha=0.4) +
  xlab("") + 
  ggtitle(var_name) +
  ylab("mean climate under survey period - mean past climate") +
  theme_bw()

})

violin_plots 

These plots show that plots surveyed in 1998 show higher increase in the mean annual temperature (bio1), lower increase in annual precipitation (bio12) and stronger decrease in temperature seasonality (bio4) than plots surveyed in 1997. As temperature seasonality is the most important variable to explain the allelic turnover in GF, it may explain the strong differences in GO predictions among plots surveyed in 1997 and 1998.

Session information

Code
devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.2 (2024-10-31)
 os       Ubuntu 22.04.5 LTS
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  fr_FR.UTF-8
 ctype    fr_FR.UTF-8
 tz       Europe/Paris
 date     2025-04-04
 pandoc   3.1.11 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/x86_64/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package           * version   date (UTC) lib source
 abind               1.4-5     2016-07-21 [1] CRAN (R 4.4.0)
 arrayhelpers        1.1-0     2020-02-04 [1] CRAN (R 4.4.0)
 backports           1.5.0     2024-05-23 [1] CRAN (R 4.4.0)
 bit                 4.0.5     2022-11-15 [1] CRAN (R 4.4.0)
 bit64               4.0.5     2020-08-30 [1] CRAN (R 4.4.0)
 broom               1.0.6     2024-05-17 [1] CRAN (R 4.4.0)
 cachem              1.0.8     2023-05-01 [1] CRAN (R 4.4.0)
 car                 3.1-2     2023-03-30 [1] CRAN (R 4.4.0)
 carData             3.0-5     2022-01-06 [1] CRAN (R 4.4.0)
 cellranger          1.1.0     2016-07-27 [1] CRAN (R 4.4.0)
 checkmate           2.3.1     2023-12-04 [1] CRAN (R 4.4.0)
 class               7.3-23    2025-01-01 [4] CRAN (R 4.4.2)
 classInt            0.4-10    2023-09-05 [1] CRAN (R 4.4.0)
 cli                 3.6.2     2023-12-11 [1] CRAN (R 4.4.0)
 coda                0.19-4.1  2024-01-31 [1] CRAN (R 4.4.0)
 codetools           0.2-19    2023-02-01 [4] CRAN (R 4.2.2)
 colorspace          2.1-0     2023-01-23 [1] CRAN (R 4.4.0)
 cowplot           * 1.1.3     2024-01-22 [1] CRAN (R 4.4.0)
 crayon              1.5.2     2022-09-29 [1] CRAN (R 4.4.0)
 DBI                 1.2.2     2024-02-16 [1] CRAN (R 4.4.0)
 devtools            2.4.5     2022-10-11 [1] CRAN (R 4.4.0)
 digest              0.6.35    2024-03-11 [1] CRAN (R 4.4.0)
 distributional      0.4.0     2024-02-07 [1] CRAN (R 4.4.0)
 dplyr             * 1.1.4     2023-11-17 [1] CRAN (R 4.4.0)
 e1071               1.7-14    2023-12-06 [1] CRAN (R 4.4.0)
 ellipsis            0.3.2     2021-04-29 [1] CRAN (R 4.4.0)
 evaluate            0.23      2023-11-01 [1] CRAN (R 4.4.0)
 fansi               1.0.6     2023-12-08 [1] CRAN (R 4.4.0)
 farver              2.1.2     2024-05-13 [1] CRAN (R 4.4.0)
 fastmap             1.1.1     2023-02-24 [1] CRAN (R 4.4.0)
 forcats           * 1.0.0     2023-01-29 [1] CRAN (R 4.4.0)
 fs                  1.6.4     2024-04-25 [1] CRAN (R 4.4.0)
 generics            0.1.3     2022-07-05 [1] CRAN (R 4.4.0)
 geodist           * 0.0.8     2022-10-19 [1] CRAN (R 4.4.0)
 ggdist              3.3.2     2024-03-05 [1] CRAN (R 4.4.0)
 ggplot2           * 3.5.1     2024-04-23 [1] CRAN (R 4.4.0)
 ggpubr            * 0.6.0.999 2024-06-05 [1] Github (kassambara/ggpubr@6aeb4f7)
 ggrepel           * 0.9.5     2024-01-10 [1] CRAN (R 4.4.0)
 ggsignif            0.6.4     2022-10-13 [1] CRAN (R 4.4.0)
 glue                1.7.0     2024-01-09 [1] CRAN (R 4.4.0)
 goftest             1.2-3     2021-10-07 [1] CRAN (R 4.4.0)
 gridExtra           2.3       2017-09-09 [1] CRAN (R 4.4.0)
 gtable              0.3.5     2024-04-22 [1] CRAN (R 4.4.0)
 here              * 1.0.1     2020-12-13 [1] CRAN (R 4.4.0)
 highr               0.10      2022-12-22 [1] CRAN (R 4.4.0)
 hms                 1.1.3     2023-03-21 [1] CRAN (R 4.4.0)
 htmltools           0.5.8.1   2024-04-04 [1] CRAN (R 4.4.0)
 htmlwidgets         1.6.4     2023-12-06 [1] CRAN (R 4.4.0)
 httpuv              1.6.15    2024-03-26 [1] CRAN (R 4.4.0)
 httr                1.4.7     2023-08-15 [1] CRAN (R 4.4.0)
 inline              0.3.19    2021-05-31 [1] CRAN (R 4.4.0)
 janitor           * 2.2.0     2023-02-02 [1] CRAN (R 4.4.0)
 jsonlite            1.8.8     2023-12-04 [1] CRAN (R 4.4.0)
 kableExtra        * 1.4.0     2024-01-24 [1] CRAN (R 4.4.0)
 KernSmooth          2.23-26   2025-01-01 [4] CRAN (R 4.4.2)
 knitr             * 1.46      2024-04-06 [1] CRAN (R 4.4.0)
 labeling            0.4.3     2023-08-29 [1] CRAN (R 4.4.0)
 later               1.3.2     2023-12-06 [1] CRAN (R 4.4.0)
 latex2exp         * 0.9.6     2022-11-28 [1] CRAN (R 4.4.0)
 lattice             0.22-5    2023-10-24 [4] CRAN (R 4.3.1)
 lifecycle           1.0.4     2023-11-07 [1] CRAN (R 4.4.0)
 lmom                3.0       2023-08-29 [1] CRAN (R 4.4.0)
 lmomco              2.4.14    2024-02-19 [1] CRAN (R 4.4.0)
 Lmoments            1.3-1     2019-03-15 [1] CRAN (R 4.4.0)
 loo                 2.7.0     2024-02-24 [1] CRAN (R 4.4.0)
 lubridate         * 1.9.3     2023-09-27 [1] CRAN (R 4.4.0)
 magrittr          * 2.0.3     2022-03-30 [1] CRAN (R 4.4.0)
 MASS                7.3-64    2025-01-04 [4] CRAN (R 4.4.2)
 matrixStats         1.3.0     2024-04-11 [1] CRAN (R 4.4.0)
 memoise             2.0.1     2021-11-26 [1] CRAN (R 4.4.0)
 mime                0.12      2021-09-28 [1] CRAN (R 4.4.0)
 miniUI              0.1.1.1   2018-05-18 [1] CRAN (R 4.4.0)
 munsell             0.5.1     2024-04-01 [1] CRAN (R 4.4.0)
 paletteer         * 1.6.0     2024-01-21 [1] CRAN (R 4.4.0)
 pillar              1.9.0     2023-03-22 [1] CRAN (R 4.4.0)
 pkgbuild            1.4.4     2024-03-17 [1] CRAN (R 4.4.0)
 pkgconfig           2.0.3     2019-09-22 [1] CRAN (R 4.4.0)
 pkgload             1.3.4     2024-01-16 [1] CRAN (R 4.4.0)
 plyr                1.8.9     2023-10-02 [1] CRAN (R 4.4.0)
 posterior           1.5.0     2023-10-31 [1] CRAN (R 4.4.0)
 profvis             0.3.8     2023-05-02 [1] CRAN (R 4.4.0)
 promises            1.3.0     2024-04-05 [1] CRAN (R 4.4.0)
 proxy               0.4-27    2022-06-09 [1] CRAN (R 4.4.0)
 purrr             * 1.0.2     2023-08-10 [1] CRAN (R 4.4.0)
 QuickJSR            1.1.3     2024-01-31 [1] CRAN (R 4.4.0)
 R6                  2.5.1     2021-08-19 [1] CRAN (R 4.4.0)
 ragg                1.3.0     2024-03-13 [1] CRAN (R 4.4.0)
 raster              3.6-26    2023-10-14 [1] CRAN (R 4.4.0)
 Rcpp                1.0.12    2024-01-09 [1] CRAN (R 4.4.0)
 RcppParallel        5.1.7     2023-02-27 [1] CRAN (R 4.4.0)
 readr             * 2.1.5     2024-01-10 [1] CRAN (R 4.4.0)
 readxl            * 1.4.3     2023-07-06 [1] CRAN (R 4.4.0)
 rematch2            2.1.2     2020-05-01 [1] CRAN (R 4.4.0)
 remotes             2.5.0     2024-03-17 [1] CRAN (R 4.4.0)
 reshape             0.8.9     2022-04-12 [1] CRAN (R 4.4.0)
 reshape2          * 1.4.4     2020-04-09 [1] CRAN (R 4.4.0)
 rlang               1.1.4     2024-06-04 [1] CRAN (R 4.4.0)
 rmarkdown           2.26      2024-03-05 [1] CRAN (R 4.4.0)
 rnaturalearth     * 1.0.1     2023-12-15 [1] CRAN (R 4.4.0)
 rnaturalearthdata   1.0.0     2024-02-09 [1] CRAN (R 4.4.0)
 rprojroot           2.0.4     2023-11-05 [1] CRAN (R 4.4.0)
 rstan             * 2.32.6    2024-03-05 [1] CRAN (R 4.4.0)
 rstatix             0.7.2     2023-02-01 [1] CRAN (R 4.4.0)
 rstudioapi          0.16.0    2024-03-24 [1] CRAN (R 4.4.0)
 scales              1.3.0     2023-11-28 [1] CRAN (R 4.4.0)
 sessioninfo         1.2.2     2021-12-06 [1] CRAN (R 4.4.0)
 sf                  1.0-16    2024-03-24 [1] CRAN (R 4.4.0)
 shiny               1.8.1.1   2024-04-02 [1] CRAN (R 4.4.0)
 snakecase           0.11.1    2023-08-27 [1] CRAN (R 4.4.0)
 sp                  2.1-4     2024-04-30 [1] CRAN (R 4.4.0)
 SPEI              * 1.8.1     2023-03-02 [1] CRAN (R 4.4.0)
 StanHeaders       * 2.32.7    2024-04-25 [1] CRAN (R 4.4.0)
 stringi             1.8.4     2024-05-06 [1] CRAN (R 4.4.0)
 stringr           * 1.5.1     2023-11-14 [1] CRAN (R 4.4.0)
 svglite             2.1.3     2023-12-08 [1] CRAN (R 4.4.0)
 svUnit              1.0.6     2021-04-19 [1] CRAN (R 4.4.0)
 systemfonts         1.0.6     2024-03-07 [1] CRAN (R 4.4.0)
 tensorA             0.36.2.1  2023-12-13 [1] CRAN (R 4.4.0)
 terra               1.7-71    2024-01-31 [1] CRAN (R 4.4.0)
 textshaping         0.3.7     2023-10-09 [1] CRAN (R 4.4.0)
 tibble            * 3.2.1     2023-03-20 [1] CRAN (R 4.4.0)
 tidybayes         * 3.0.6     2023-08-12 [1] CRAN (R 4.4.0)
 tidyr             * 1.3.1     2024-01-24 [1] CRAN (R 4.4.0)
 tidyselect          1.2.1     2024-03-11 [1] CRAN (R 4.4.0)
 tidyverse         * 2.0.0     2023-02-22 [1] CRAN (R 4.4.0)
 timechange          0.3.0     2024-01-18 [1] CRAN (R 4.4.0)
 TLMoments           0.7.5.3   2022-03-27 [1] CRAN (R 4.4.0)
 tzdb                0.4.0     2023-05-12 [1] CRAN (R 4.4.0)
 units               0.8-5     2023-11-28 [1] CRAN (R 4.4.0)
 urlchecker          1.0.1     2021-11-30 [1] CRAN (R 4.4.0)
 usethis             2.2.3     2024-02-19 [1] CRAN (R 4.4.0)
 utf8                1.2.4     2023-10-22 [1] CRAN (R 4.4.0)
 vctrs               0.6.5     2023-12-01 [1] CRAN (R 4.4.0)
 viridisLite         0.4.2     2023-05-02 [1] CRAN (R 4.4.0)
 vroom               1.6.5     2023-12-05 [1] CRAN (R 4.4.0)
 withr               3.0.0     2024-01-16 [1] CRAN (R 4.4.0)
 xfun                0.43      2024-03-25 [1] CRAN (R 4.4.0)
 xml2                1.3.6     2023-12-04 [1] CRAN (R 4.4.0)
 xtable            * 1.8-4     2019-04-21 [1] CRAN (R 4.4.0)
 yaml                2.3.8     2023-12-11 [1] CRAN (R 4.4.0)
 zoo                 1.8-12    2023-04-13 [1] CRAN (R 4.4.0)

 [1] /home/juliette/R/x86_64-pc-linux-gnu-library/4.4
 [2] /usr/local/lib/R/site-library
 [3] /usr/lib/R/site-library
 [4] /usr/lib/R/library

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

References

Changenet, Alexandre, Paloma Ruiz-Benito, Sophia Ratcliffe, Thibaut Fréjaville, Juliette Archambeau, Annabel J Porte, Miguel A Zavala, Jonas Dahlgren, Aleksi Lehtonen, and Marta Benito Garzón. 2021. “Occurrence but Not Intensity of Mortality Rises Towards the Climatic Trailing Edge of Tree Species Ranges in European Forests.” Global Ecology and Biogeography. https://onlinelibrary.wiley.com/doi/abs/10.1111/geb.13301.